LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 2

Reply
 
Thread Tools Display Modes
  #1  
Old 07-18-2012, 02:05 AM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Hoeffding’s inequality

I really struggled to understand Hoeffding’s inequality in the lectures and now we have a question on it so I really would like to understand. Please can anybody help?...

P[|Ein(g) − Eout(g)| > epsilon] ≤ ...

My questions are:
1) from the lectures I understood Eout(g) to be the "true" error (if you had access to the full population of data). In reality, when we are applying a machine learning technique, we do so because we have limited data and therefore I fail to understand how this formula can be applied? Perhaps I misunderstood, perhaps Eout is the error found from a seperate set of data (not from the whole population)? I.e., I have learn't elswehere that it is a good idea when "learning from data" to have a training set, a cross validation set and a test set. Maybe Eout actually refers to the cross validation set?

2) what is a suitable value for epsilon? This wasn't mentioned in the lectures (unless I missed it)? Can this be calculated?
Reply With Quote
  #2  
Old 07-18-2012, 04:02 AM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Re: Hoeffding’s inequality

Just watching lecture 2 again and is all starting to make sense.. but I'm sure I will still be back with more questions later lol
Reply With Quote
  #3  
Old 07-18-2012, 06:24 AM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Re: Hoeffding’s inequality

Going back to my questions, I now feel I grasp the principles and realise that because the right side of the inequality equation doesn't rely on Eout (or Ein) to calculate, that this tells us the probability of generalisation based on values of n, epsilon and M. But M will be infinite when we deal with learning techniques such as perceptron learning etc. i.e., providing a large value of n (big sample size) will tend to mean better generalisation, a large value of M will have the adverse affect causing poor generalisation.

Because we were told M will be infinite for "learning techniques" I believe we will be told a way to get round this problem in future lectures?

I believe the epsilon gives us a confidence interval as to whether the sample (Ein) generalises to the overall population. Please could somebody confirm the following:

if I set epsilon to say 0.01 in Hoeffding’s inequality, is this equivalent to saying "what is the probability that Ein DOES NOT generalise to Eout by at least 99%, given the parameters n and M"?

Last edited by itooam; 07-18-2012 at 06:34 AM. Reason: incorrect wording in my last paragraph
Reply With Quote
  #4  
Old 07-18-2012, 10:12 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Hoeffding’s inequality

Quote:
Originally Posted by itooam View Post
from the lectures I understood Eout(g) to be the "true" error (if you had access to the full population of data). In reality, when we are applying a machine learning technique, we do so because we have limited data and therefore I fail to understand how this formula can be applied?
In a practical machine learning situation, we don't have E_{\rm out} but we estimate it using a test set (set of fresh points) as we have done in Homework 1 with the perceptron. The accuracy of that estimate will be discussed in detail when we talk about validation.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #5  
Old 07-18-2012, 03:37 PM
itooam itooam is offline
Senior Member
 
Join Date: Jul 2012
Posts: 100
Default Re: Hoeffding’s inequality

Thanks yaser, I will look forward to that lecture (btw, I'm loving the course a big thank you)
Reply With Quote
  #6  
Old 07-21-2012, 01:57 PM
rakhlin rakhlin is offline
Member
 
Join Date: Jun 2012
Posts: 24
Default Re: Hoeffding’s inequality

I got all numbers, but don't understand the problem.

First, c_rand and c_min aren't certain coins. They're many random different coins so we can't derive their features from the experiment. Formally we can speak of c_1 only.

Second, all three nu's (their averages) converge to theoretical limits - not sure I can post actual numbers - two equal numbers and another one. In this sense all three "coins" satisfy Hoeffding’s inequality, that is they generate data that probabilistically converge to certain limits (not 0.5 in case of c_min). But we can't select all three as an answer.

So I would be grateful for clarification to this problem
Reply With Quote
  #7  
Old 07-21-2012, 03:12 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Hoeffding’s inequality

Quote:
Originally Posted by rakhlin View Post
First, c_rand and c_min aren't certain coins.
This is precisely the point why there is a question whether or not they satisfy Hoeffding inequality. The fact that they are not fixed coins does not prevent us from recording whether we got heads or tails from them in each iteration, so we can compute the frequency of heads. Since all the coins are identical in terms of the probability of getting a head or a tail, we have that same probability of heads regardless of which coin comes out.

Now the question is, when you compute the statistics of heads and tails in such experiment, do these statistics happen to fall under the bound that the basic Hoeffding inequality predicts for a fixed coin, notwithstanding that this experiment is not based on a fixed coin?
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #8  
Old 07-21-2012, 04:30 PM
rakhlin rakhlin is offline
Member
 
Join Date: Jun 2012
Posts: 24
Default Re: Hoeffding’s inequality

Thank you for explanation, Professor.

Quote:
Originally Posted by yaser View Post
do these statistics happen to fall under the bound that the basic Hoeffding inequality predicts for a fixed coin, notwithstanding that this experiment is not based on a fixed coin?
c_min by construction has no equivalent experiment based on a fixed coin.
All three "coins" satisfy Hoeffding’s inequality in my opinion. Of cause c_min is a special story, it has another prediction. It's apparent. Maybe I am overthinking the problem.
Reply With Quote
  #9  
Old 07-22-2012, 04:35 AM
hesam.creative hesam.creative is offline
Junior Member
 
Join Date: Jul 2012
Posts: 7
Default Re: Hoeffding’s inequality

hi
I have some trouble grasping Hoeffding’s inequality and the idea of using coins as marbles.
first of all,the probability of error in whole space is .5,since the coins are fair.while the probability of error in sample space is frequency of heads,say.
Hoeffding’s inequality stats that given an experiment,the probability of how badly we learned does not depend on how the experiment performed;however, in case of finite flipping,the right hand side depends on it and the threshold.
so,in our problem,after running the experiment we have some coins with known in sample error.we choose one of them and now how we should use the Hoeffding’s inequality?
thank you to putting your time on this.
Reply With Quote
  #10  
Old 07-22-2012, 01:53 PM
orlando orlando is offline
Junior Member
 
Join Date: Jul 2012
Posts: 4
Default Re: Hoeffding’s inequality

My point of view for this questions it's that we should take the Hoeffding’s inequality as a form of convergence guarantee. This is an obvious observation of course, but seeing things in this light lets you compare the averages to the true expected value and conclude that if some value is close enough (say with eps = 0.01), then it satisfy the inequality because the probability of something bad happening it's veeery close to zero.

On the subject of the coins experiments, it doesn't matter if the coin you select it's always different, in reality they should be indistinguishable from each other, they all are fair coins independently tossed.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 08:46 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.