LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 1 - The Learning Problem

Reply
 
Thread Tools Display Modes
  #1  
Old 09-18-2015, 11:26 PM
henry2015 henry2015 is offline
Member
 
Join Date: Aug 2015
Posts: 31
Default Hoeffding Inequality

Hi,

On page 22, it says, "the hypothesis h is fixed before you generate the data set, and the
probability is with respect to random data sets D; we emphasize that the assumption "h is fixed before you generate the data set" is critical to the validity of this bound".

Few questions:
1. Does the "data set" in "generate the data set" refer to the marble (which is the data set D) we pick randomly from the jar? Or it refers to the set of outputs (red/green) of h(x) on D?
2. It keeps mentioning "h is fixed before you generate the data set". Does it mean in machine learning, a set of h should be predefined before seeing any training data and no h can be added to the set after seeing the training data?

Thanks!
Reply With Quote
  #2  
Old 09-18-2015, 11:38 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Hoeffding Inequality

Quote:
Originally Posted by henry2015 View Post
1. Does the "data set" in "generate the data set" refer to the marble (which is the data set D) we pick randomly from the jar? Or it refers to the set of outputs (red/green) of h(x) on D?
The target f is assumed to be fixed, so since h is also fixed, the colors of all marbles are fixed and picking the data set would mean picking the marbles in the sample.

Quote:
2. It keeps mentioning "h is fixed before you generate the data set". Does it mean in machine learning, a set of h should be predefined before seeing any training data and no h can be added to the set after seeing the training data?
This is the assumption that the theory is based on. If one wants to add hypotheses after seeing the data and still apply the theory, one should take the set of hypotheses to include all potential hypotheses that may be added (whatever the data set may be).
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 09-19-2015, 12:36 AM
henry2015 henry2015 is offline
Member
 
Join Date: Aug 2015
Posts: 31
Default Re: Hoeffding Inequality

Thanks for your quick reply Professor!

Now, I wonder why "we cannot just plug in g for h in the Hoeffding inequality". Given g is one of h's and for each h, Hoeffding inequality is valid for the upper bound of P[|Ein(h) - Eout(h)| > E]. Even g is picked after we look at all the outputs of all h's, g is still one of h's. So Hoeffding inequality should be still valid for g. No?

Thanks!
Reply With Quote
  #4  
Old 09-19-2015, 03:56 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Hoeffding Inequality

Quote:
Originally Posted by henry2015 View Post
Thanks for your quick reply Professor!

Now, I wonder why "we cannot just plug in g for h in the Hoeffding inequality". Given g is one of h's and for each h, Hoeffding inequality is valid for the upper bound of P[|Ein(h) - Eout(h)| > E]. Even g is picked after we look at all the outputs of all h's, g is still one of h's. So Hoeffding inequality should be still valid for g. No?

Thanks!
This is the main point of this part. Take the coin flipping example, with each of 1000 fair coins flipped 10 times. Hoeffding applies to each coin, right? Now if we pick "g" to be the coin that produced the most heads, we lose the Hoeffding guarantee because the small probability of bad behavior for each coin accumulates into a not-so-small probability of bad behavior of some coin (which we picked deliberately because it behaved badly).
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #5  
Old 10-01-2015, 05:22 AM
henry2015 henry2015 is offline
Member
 
Join Date: Aug 2015
Posts: 31
Default Re: Hoeffding Inequality

Hi Professor,

I just have a hard time to understand that how choosing a hypothesis changes a theory -- Hoeffding inequality.

Let's say h1(x) < P1, h2(x) < P2. We choose h2 to be g. Then h2(x) < P2 is no longer true?

I sort of understand your example because we pick the run of coin flipping that produces most heads, and so if we plot the graph, the graph indicates that Hoeffding inequality doesn't apply. But Hoeffding inequality is talking about probability and so the reality might be off a bit.

Maybe I am in a wrong direction?
Reply With Quote
  #6  
Old 10-01-2015, 10:38 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Hoeffding Inequality

It's a subtle point. There is "cherry picking" if we fish for a sample that has certain properties after many trials, instead of having a sample that is fairly drawn from a fixed hypothesis.

Statements involving probability are tricky because they don't guarantee a particular outcome, just the likelihood of getting that outcome. Therefore, changing the game to allow more trials or different conditions would change the probabilities.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #7  
Old 10-04-2015, 09:15 PM
henry2015 henry2015 is offline
Member
 
Join Date: Aug 2015
Posts: 31
Default Re: Hoeffding Inequality

What the book states "e cannot just plug in g for h in the Hoeffding inequality" means that Hoeffding inequality is still true for g as it is one of the h's. But Hoeffding inequality seems failed for g because we are cherry picking.

Just like flipping an unbiased coin 1 million times, we should see 500K heads and 500K tails, but we might have "bad luck" that we see 1 million heads but the P(head) is still 0.5.

Do I interpret correctly?

Thanks a lot!
Reply With Quote
  #8  
Old 10-05-2015, 08:39 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Hoeffding Inequality

Let me rephrase it. Let's say (like in Hoeffding) that a rare event has a probability of at most 1% of happening. If we make repeated independent trials looking for that event, each trial still gives a probability of at most 1% for that event to happen. Now, if we actively search for the case when that rare event actually happened among these many trials, we will succeed in finding it with probability much more than 1%.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #9  
Old 03-07-2016, 12:19 AM
pouramini pouramini is offline
Member
 
Join Date: Mar 2016
Posts: 16
Default Re: Hoeffding Inequality

I also have the same questions, and I read your replies

please consider if I have the correct conclusions:

1- we cannot plug "g" for "h" in inequality, because it depends on the sample we already selected, or in other words, we choose it deliberately (as the h with lowest error inside D) like selecting the bin which has the minimum frequency of heads.

So! what if we select "g" randomly? (in a uniform distribution of hs ?) or to select a bin randomly, then can we use Hoeffding inequality for "g"? or still we should consider M, the H size?

2- which of the following interpretation for equation 1.6 are correct:
  • The only function that has zero error inside and outside D is f, So if the number of hypothesis increases, then the chance to select "f" (the correct function, or better approximation) becomes lower. (however I feel its not what you say)
  • Or maybe, when we increase the number of hypothesis, we increase the chance that data behave differently inside and outside the D! for example if we limit the hypothesis to one! we may have high error but we lower the difference between E(in) and E(out). For example if we use one feature, we have limited the number of hypothesis! then when we evaluate h outside D, its not flexible enough to show minor errors, then it is more close to E(in)?!
=====================================

Second question:

In "h is fixed before you generate the data set"
I also can't understand your emphasis on "before".

Do you want to say that h shouldn't change?
because I feel h is independent from D then "before" or "after" doesn't mean much. We don't need to have an h in mind to be able to generate D, we can select D, then decide which h to use, then evaluate h over D, but we should use the same h for the test set, right? or maybe h is used somehow in generating D?! Anyway, I think you may mean it should be selected independently from D
Reply With Quote
  #10  
Old 03-07-2016, 03:28 AM
ntvy95 ntvy95 is offline
Member
 
Join Date: Jan 2016
Posts: 37
Default Re: Hoeffding Inequality

I think you can take a look on MaciekLeks' post for the experiment result of the Exercise 1.10 (in the book).

In my understanding: g is the final hypothesis that is known after the data set is generated (because the choice of final hypothesis is based on the specific data set). Before the data set is generated, all the information that we know about g is that g is one of the hypotheses in H (hence the M). h is a specific hypothesis that is an element of H, and I don't think that we are selecting h, I think we are selecting g instead.

Quote:
Originally Posted by pouramini View Post
I also have the same questions, and I read your replies

please consider if I have the correct conclusions:

1- we cannot plug "g" for "h" in inequality, because it depends on the sample we already selected, or in other words, we choose it deliberately (as the h with lowest error inside D) like selecting the bin which has the minimum frequency of heads.

So! what if we select "g" randomly? (in a uniform distribution of hs ?) or to select a bin randomly, then can we use Hoeffding inequality for "g"? or still we should consider M, the H size?

2- which of the following interpretation for equation 1.6 are correct:
  • The only function that has zero error inside and outside D is f, So if the number of hypothesis increases, then the chance to select "f" (the correct function, or better approximation) becomes lower. (however I feel its not what you say)
  • Or maybe, when we increase the number of hypothesis, we increase the chance that data behave differently inside and outside the D! for example if we limit the hypothesis to one! we may have high error but we lower the difference between E(in) and E(out). For example if we use one feature, we have limited the number of hypothesis! then when we evaluate h outside D, its not flexible enough to show minor errors, then it is more close to E(in)?!
=====================================

Second question:

In "h is fixed before you generate the data set"
I also can't understand your emphasis on "before".

Do you want to say that h shouldn't change?
because I feel h is independent from D then "before" or "after" doesn't mean much. We don't need to have an h in mind to be able to generate D, we can select D, then decide which h to use, then evaluate h over D, but we should use the same h for the test set, right? or maybe h is used somehow in generating D?! Anyway, I think you may mean it should be selected independently from D
Reply With Quote
Reply

Tags
fixed hypothesis, hoeffding inequality

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 06:56 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.