LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 2 (http://book.caltech.edu/bookforum/forumdisplay.php?f=131)
-   -   Hoeffding’s inequality (http://book.caltech.edu/bookforum/showthread.php?t=880)

itooam 07-18-2012 01:05 AM

Hoeffding’s inequality
 
I really struggled to understand Hoeffding’s inequality in the lectures and now we have a question on it so I really would like to understand. Please can anybody help?...

P[|Ein(g) − Eout(g)| > epsilon] ≤ ...

My questions are:
1) from the lectures I understood Eout(g) to be the "true" error (if you had access to the full population of data). In reality, when we are applying a machine learning technique, we do so because we have limited data and therefore I fail to understand how this formula can be applied? Perhaps I misunderstood, perhaps Eout is the error found from a seperate set of data (not from the whole population)? I.e., I have learn't elswehere that it is a good idea when "learning from data" to have a training set, a cross validation set and a test set. Maybe Eout actually refers to the cross validation set?

2) what is a suitable value for epsilon? This wasn't mentioned in the lectures (unless I missed it)? Can this be calculated?

itooam 07-18-2012 03:02 AM

Re: Hoeffding’s inequality
 
Just watching lecture 2 again and is all starting to make sense.. but I'm sure I will still be back with more questions later lol

itooam 07-18-2012 05:24 AM

Re: Hoeffding’s inequality
 
Going back to my questions, I now feel I grasp the principles and realise that because the right side of the inequality equation doesn't rely on Eout (or Ein) to calculate, that this tells us the probability of generalisation based on values of n, epsilon and M. But M will be infinite when we deal with learning techniques such as perceptron learning etc. i.e., providing a large value of n (big sample size) will tend to mean better generalisation, a large value of M will have the adverse affect causing poor generalisation.

Because we were told M will be infinite for "learning techniques" I believe we will be told a way to get round this problem in future lectures?

I believe the epsilon gives us a confidence interval as to whether the sample (Ein) generalises to the overall population. Please could somebody confirm the following:

if I set epsilon to say 0.01 in Hoeffding’s inequality, is this equivalent to saying "what is the probability that Ein DOES NOT generalise to Eout by at least 99%, given the parameters n and M"?

yaser 07-18-2012 09:12 AM

Re: Hoeffding’s inequality
 
Quote:

Originally Posted by itooam (Post 3499)
from the lectures I understood Eout(g) to be the "true" error (if you had access to the full population of data). In reality, when we are applying a machine learning technique, we do so because we have limited data and therefore I fail to understand how this formula can be applied?

In a practical machine learning situation, we don't have E_{\rm out} but we estimate it using a test set (set of fresh points) as we have done in Homework 1 with the perceptron. The accuracy of that estimate will be discussed in detail when we talk about validation.

itooam 07-18-2012 02:37 PM

Re: Hoeffding’s inequality
 
Thanks yaser, I will look forward to that lecture (btw, I'm loving the course :) a big thank you)

rakhlin 07-21-2012 12:57 PM

Re: Hoeffding’s inequality
 
I got all numbers, but don't understand the problem.

First, c_rand and c_min aren't certain coins. They're many random different coins so we can't derive their features from the experiment. Formally we can speak of c_1 only.

Second, all three nu's (their averages) converge to theoretical limits - not sure I can post actual numbers - two equal numbers and another one. In this sense all three "coins" satisfy Hoeffding’s inequality, that is they generate data that probabilistically converge to certain limits (not 0.5 in case of c_min). But we can't select all three as an answer.

So I would be grateful for clarification to this problem

yaser 07-21-2012 02:12 PM

Re: Hoeffding’s inequality
 
Quote:

Originally Posted by rakhlin (Post 3568)
First, c_rand and c_min aren't certain coins.

This is precisely the point why there is a question whether or not they satisfy Hoeffding inequality. The fact that they are not fixed coins does not prevent us from recording whether we got heads or tails from them in each iteration, so we can compute the frequency of heads. Since all the coins are identical in terms of the probability of getting a head or a tail, we have that same probability of heads regardless of which coin comes out.

Now the question is, when you compute the statistics of heads and tails in such experiment, do these statistics happen to fall under the bound that the basic Hoeffding inequality predicts for a fixed coin, notwithstanding that this experiment is not based on a fixed coin?

rakhlin 07-21-2012 03:30 PM

Re: Hoeffding’s inequality
 
Thank you for explanation, Professor.

Quote:

Originally Posted by yaser (Post 3571)
do these statistics happen to fall under the bound that the basic Hoeffding inequality predicts for a fixed coin, notwithstanding that this experiment is not based on a fixed coin?

c_min by construction has no equivalent experiment based on a fixed coin.
All three "coins" satisfy Hoeffding’s inequality in my opinion. Of cause c_min is a special story, it has another prediction. It's apparent. Maybe I am overthinking the problem.

hesam.creative 07-22-2012 03:35 AM

Re: Hoeffding’s inequality
 
hi
I have some trouble grasping Hoeffding’s inequality and the idea of using coins as marbles.
first of all,the probability of error in whole space is .5,since the coins are fair.while the probability of error in sample space is frequency of heads,say.
Hoeffding’s inequality stats that given an experiment,the probability of how badly we learned does not depend on how the experiment performed;however, in case of finite flipping,the right hand side depends on it and the threshold.
so,in our problem,after running the experiment we have some coins with known in sample error.we choose one of them and now how we should use the Hoeffding’s inequality?
thank you to putting your time on this.

orlando 07-22-2012 12:53 PM

Re: Hoeffding’s inequality
 
My point of view for this questions it's that we should take the Hoeffding’s inequality as a form of convergence guarantee. This is an obvious observation of course, but seeing things in this light lets you compare the averages to the true expected value and conclude that if some value is close enough (say with eps = 0.01), then it satisfy the inequality because the probability of something bad happening it's veeery close to zero.

On the subject of the coins experiments, it doesn't matter if the coin you select it's always different, in reality they should be indistinguishable from each other, they all are fair coins independently tossed.


All times are GMT -7. The time now is 09:25 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.