LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 1 - The Learning Problem (http://book.caltech.edu/bookforum/forumdisplay.php?f=108)
-   -   role of P(X) ? (http://book.caltech.edu/bookforum/showthread.php?t=4188)

kokrah 04-11-2013 01:34 AM

role of P(X) ?
 
The Hoeffding bound for the model H in chapter one, only requires that
we make the assumption that the input examples are a random sample
from the bin; so we can generalize the sample error.

What role does the distribution on X play? It appears to me that we don't need
it. (at least the way the issue of feasibility is setup in chapter 1)
ie. true mismatch ~ sample mismatch.

Thanks.

yaser 04-11-2013 01:48 AM

Re: role of P(X) ?
 
Quote:

Originally Posted by kokrah (Post 10336)
The Hoeffding bound for the model H in chapter one, only requires that
we make the assumption that the input examples are a random sample
from the bin; so we can generalize the sample error.

What role does the distribution on X play? It appears to me that we don't need
it. (at least the way the issue of feasibility is setup in chapter 1)
ie. true mismatch ~ sample mismatch.

Thanks.

We need the existence of the input probability distribution P so that "a random sample" becomes well defined, but we don't need any particular P to do that since any P will correspond to a legitimate \mu for the bin.

nkatz 04-11-2013 06:07 AM

Re: role of P(X) ?
 
So can you say that P(X) populates the bin and determines mu? In that case we would be sampling P(X); is this correct?

kokrah 04-11-2013 07:04 AM

Re: role of P(X) ?
 
I see.

Example:
Y|x = x + \epsilon is the target.
X \sim F(x) is the input space.

If we let 1. \epsilon \sim N(0,1); X \sim N(0,1)
or
2. \epsilon \sim N(0,1); X \sim t(1), where t(1) is the t-distribution with one degree of freedom.

I know from my stat classes that in case 1. a linear model is actually "correct".
(this is great since we usually know nothing about f)
So in this case the distribution of X plays a role in selecting H, and hence
reducing the in sample error. (assuming the quadratic loss fct.)

Questions:
So in either case 1. or 2. the interpretation/computation of the sample error is the same?
I am a little confused since the overall true error
(which we hope the sample error approximates) is defined based on the joint
distribution of (X,Y); which depends on the distribution of X.

Thanks. I hope this class/book can clear up some mis-conceptions about the theoretical framework of the learning problem once and for all :)

Elroch 04-11-2013 08:55 AM

Re: role of P(X) ?
 
Quote:

Originally Posted by kokrah (Post 10336)
The Hoeffding bound for the model H in chapter one, only requires that
we make the assumption that the input examples are a random sample
from the bin; so we can generalize the sample error.

What role does the distribution on X play? It appears to me that we don't need
it. (at least the way the issue of feasibility is setup in chapter 1)
ie. true mismatch ~ sample mismatch.

Thanks.

As I see it, the theory of generalisation relies on the fact that the distribution P(X, y) which gives the examples used to generate a model (both training and test data) is the same as the distribution of examples which we are trying to learn. There are two things that can go wrong. Either P(X) may be different, or P(y | X) may be different. In the first case, the examples may be concentrated in some subset of the input space, and this may be a region where the models work better. Obviously the second case can also lead to misleading conclusions.

[This may appear to be a trivial assumption when sampling from some populations, but it is likely to be non-trivial in many cases where we are attempting to infer future behavior from past behavior in a system whose characteristics may change]

yaser 04-11-2013 09:13 AM

Re: role of P(X) ?
 
Quote:

Originally Posted by nkatz (Post 10339)
So can you say that P(X) populates the bin and determines mu? In that case we would be sampling P(X); is this correct?

P affects the value of \mu because it affects the probability of each {\bf x}\in{\cal X}, so the probability of red marbles (i.e., {\bf x}'s where h({\bf x})\ne f({\bf x})) changes accordingly. We are sampling according to P, except that when we look at the bin abstraction of the situation, we only care about the color of {\bf x} not its identity, so the binary-event probability \mu is sufficient to characterize sampling of the marbles.

netweavercn 02-06-2014 03:57 PM

Re: role of P(X) ?
 
Quote:

Originally Posted by yaser (Post 10342)
P affects the value of \mu because it affects the probability of each {\bf x}\in{\cal X}, so the probability of red marbles (i.e., {\bf x}'s where h({\bf x})\ne f({\bf x})) changes accordingly. We are sampling according to P, except that when we look at the bin abstraction of the situation, we only care about the color of {\bf x} not its identity, so the binary-event probability \mu is sufficient to characterize sampling of the marbles.


Thanks prof Yaser's reply. A quick question, as we are sampling according to P, how P effect each {\bf x}\in{\cal X}? In other words, P determines {\bf x}\in{\cal X} or Sampling process or both?

yaser 02-07-2014 03:04 AM

Re: role of P(X) ?
 
Quote:

Originally Posted by netweavercn (Post 11641)
Thanks prof Yaser's reply. A quick question, as we are sampling according to P, how P effect each {\bf x}\in{\cal X}? In other words, P determines {\bf x}\in{\cal X} or Sampling process or both?

The answer would be both, since the probability of each x affects the sampling process (not the mechanism of it, but the frequency of different outcomes that it produces).

giridhar1202 09-07-2015 11:50 AM

Re: role of P(X) ?
 
Quote:

Originally Posted by yaser (Post 10342)
P affects the value of \mu because it affects the probability of each {\bf x}\in{\cal X}, so the probability of red marbles (i.e., {\bf x}'s where h({\bf x})\ne f({\bf x})) changes accordingly. We are sampling according to P, except that when we look at the bin abstraction of the situation, we only care about the color of {\bf x} not its identity, so the binary-event probability \mu is sufficient to characterize sampling of the marbles.


But isn't \mu fixed when you choose a particular hypothesis h. [ Because number of red marbles is equal to the number of points in the input space where hypothesis ( h ) and target function ( f ) disagree. And this, in my opinion, has nothing to do with probability distribution function ]

Please clarify.

Thanks,
Giridhar.

yaser 09-07-2015 07:06 PM

Re: role of P(X) ?
 
Quote:

Originally Posted by giridhar1202 (Post 12029)
But isn't \mu fixed when you choose a particular hypothesis h. [ Because number of red marbles is equal to the number of points in the input space where hypothesis ( h ) and target function ( f ) disagree. And this, in my opinion, has nothing to do with probability distribution function

The number of marbles, or the fraction of marbles, is a simplification to make the experiment more intuitive. In reality, each marble has a probability of being picked, namely P({\bf x}), that may be different for other marbles. This affects the total probability of red marbles, which is \mu.

To take a simple example, Let's say that there are only two marbles in the bin, one red and one green, but the red marble has a higher probability of being picked than the green marble. In this case, \mu is not 1/2 though the fraction of red marbles is 1/2.


All times are GMT -7. The time now is 07:27 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.