LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 1 - The Learning Problem

Reply
 
Thread Tools Display Modes
  #1  
Old 04-11-2013, 02:34 AM
kokrah kokrah is offline
Junior Member
 
Join Date: Apr 2013
Posts: 3
Default role of P(X) ?

The Hoeffding bound for the model H in chapter one, only requires that
we make the assumption that the input examples are a random sample
from the bin; so we can generalize the sample error.

What role does the distribution on X play? It appears to me that we don't need
it. (at least the way the issue of feasibility is setup in chapter 1)
ie. true mismatch ~ sample mismatch.

Thanks.
Reply With Quote
  #2  
Old 04-11-2013, 02:48 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: role of P(X) ?

Quote:
Originally Posted by kokrah View Post
The Hoeffding bound for the model H in chapter one, only requires that
we make the assumption that the input examples are a random sample
from the bin; so we can generalize the sample error.

What role does the distribution on X play? It appears to me that we don't need
it. (at least the way the issue of feasibility is setup in chapter 1)
ie. true mismatch ~ sample mismatch.

Thanks.
We need the existence of the input probability distribution P so that "a random sample" becomes well defined, but we don't need any particular P to do that since any P will correspond to a legitimate \mu for the bin.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 04-11-2013, 07:07 AM
nkatz nkatz is offline
Junior Member
 
Join Date: Apr 2013
Posts: 4
Default Re: role of P(X) ?

So can you say that P(X) populates the bin and determines mu? In that case we would be sampling P(X); is this correct?
Reply With Quote
  #4  
Old 04-11-2013, 08:04 AM
kokrah kokrah is offline
Junior Member
 
Join Date: Apr 2013
Posts: 3
Default Re: role of P(X) ?

I see.

Example:
Y|x = x + \epsilon is the target.
X \sim F(x) is the input space.

If we let 1. \epsilon \sim N(0,1); X \sim N(0,1)
or
2. \epsilon \sim N(0,1); X \sim t(1), where t(1) is the t-distribution with one degree of freedom.

I know from my stat classes that in case 1. a linear model is actually "correct".
(this is great since we usually know nothing about f)
So in this case the distribution of X plays a role in selecting H, and hence
reducing the in sample error. (assuming the quadratic loss fct.)

Questions:
So in either case 1. or 2. the interpretation/computation of the sample error is the same?
I am a little confused since the overall true error
(which we hope the sample error approximates) is defined based on the joint
distribution of (X,Y); which depends on the distribution of X.

Thanks. I hope this class/book can clear up some mis-conceptions about the theoretical framework of the learning problem once and for all
Reply With Quote
  #5  
Old 04-11-2013, 09:55 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: role of P(X) ?

Quote:
Originally Posted by kokrah View Post
The Hoeffding bound for the model H in chapter one, only requires that
we make the assumption that the input examples are a random sample
from the bin; so we can generalize the sample error.

What role does the distribution on X play? It appears to me that we don't need
it. (at least the way the issue of feasibility is setup in chapter 1)
ie. true mismatch ~ sample mismatch.

Thanks.
As I see it, the theory of generalisation relies on the fact that the distribution P(X, y) which gives the examples used to generate a model (both training and test data) is the same as the distribution of examples which we are trying to learn. There are two things that can go wrong. Either P(X) may be different, or P(y | X) may be different. In the first case, the examples may be concentrated in some subset of the input space, and this may be a region where the models work better. Obviously the second case can also lead to misleading conclusions.

[This may appear to be a trivial assumption when sampling from some populations, but it is likely to be non-trivial in many cases where we are attempting to infer future behavior from past behavior in a system whose characteristics may change]
Reply With Quote
  #6  
Old 04-11-2013, 10:13 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: role of P(X) ?

Quote:
Originally Posted by nkatz View Post
So can you say that P(X) populates the bin and determines mu? In that case we would be sampling P(X); is this correct?
P affects the value of \mu because it affects the probability of each {\bf x}\in{\cal X}, so the probability of red marbles (i.e., {\bf x}'s where h({\bf x})\ne f({\bf x})) changes accordingly. We are sampling according to P, except that when we look at the bin abstraction of the situation, we only care about the color of {\bf x} not its identity, so the binary-event probability \mu is sufficient to characterize sampling of the marbles.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #7  
Old 02-06-2014, 04:57 PM
netweavercn netweavercn is offline
Junior Member
 
Join Date: Jan 2014
Posts: 7
Default Re: role of P(X) ?

Quote:
Originally Posted by yaser View Post
P affects the value of \mu because it affects the probability of each {\bf x}\in{\cal X}, so the probability of red marbles (i.e., {\bf x}'s where h({\bf x})\ne f({\bf x})) changes accordingly. We are sampling according to P, except that when we look at the bin abstraction of the situation, we only care about the color of {\bf x} not its identity, so the binary-event probability \mu is sufficient to characterize sampling of the marbles.

Thanks prof Yaser's reply. A quick question, as we are sampling according to P, how P effect each {\bf x}\in{\cal X}? In other words, P determines {\bf x}\in{\cal X} or Sampling process or both?
Reply With Quote
  #8  
Old 02-07-2014, 04:04 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: role of P(X) ?

Quote:
Originally Posted by netweavercn View Post
Thanks prof Yaser's reply. A quick question, as we are sampling according to P, how P effect each {\bf x}\in{\cal X}? In other words, P determines {\bf x}\in{\cal X} or Sampling process or both?
The answer would be both, since the probability of each x affects the sampling process (not the mechanism of it, but the frequency of different outcomes that it produces).
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #9  
Old 09-07-2015, 12:50 PM
giridhar1202 giridhar1202 is offline
Junior Member
 
Join Date: Sep 2015
Posts: 2
Default Re: role of P(X) ?

Quote:
Originally Posted by yaser View Post
P affects the value of \mu because it affects the probability of each {\bf x}\in{\cal X}, so the probability of red marbles (i.e., {\bf x}'s where h({\bf x})\ne f({\bf x})) changes accordingly. We are sampling according to P, except that when we look at the bin abstraction of the situation, we only care about the color of {\bf x} not its identity, so the binary-event probability \mu is sufficient to characterize sampling of the marbles.

But isn't \mu fixed when you choose a particular hypothesis h. [ Because number of red marbles is equal to the number of points in the input space where hypothesis ( h ) and target function ( f ) disagree. And this, in my opinion, has nothing to do with probability distribution function ]

Please clarify.

Thanks,
Giridhar.
Reply With Quote
  #10  
Old 09-07-2015, 08:06 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: role of P(X) ?

Quote:
Originally Posted by giridhar1202 View Post
But isn't \mu fixed when you choose a particular hypothesis h. [ Because number of red marbles is equal to the number of points in the input space where hypothesis ( h ) and target function ( f ) disagree. And this, in my opinion, has nothing to do with probability distribution function
The number of marbles, or the fraction of marbles, is a simplification to make the experiment more intuitive. In reality, each marble has a probability of being picked, namely P({\bf x}), that may be different for other marbles. This affects the total probability of red marbles, which is \mu.

To take a simple example, Let's say that there are only two marbles in the bin, one red and one green, but the red marble has a higher probability of being picked than the green marble. In this case, \mu is not 1/2 though the fraction of red marbles is 1/2.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 08:46 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.