LFD Book Forum How does Hoeffding's inequality work if we have a fixed sample?
 User Name Remember Me? Password
 Register FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#1
04-11-2013, 05:47 PM
 grozhd Junior Member Join Date: Apr 2013 Posts: 4
How does Hoeffding's inequality work if we have a fixed sample?

As far as I understand Hoeffding's inequality tells us about the probability of "big" differences between a random variable value (mean of N independent r.v.'s) and a number (it's expected value).

Since it is a probabilistic inequality it means that if we try to do the expirement we will get this "big" difference only with certain limited probability. And "doing the experiment" assumes that we have a probability space X with a probability function P, then some outcome x from X happens, our random variables have some value on this x.

But when we are talking about learning we have a fixed data set so it seems to me that we don't observe a new outcome x each time, but we use one fixed which corresponds to our learning data set. And as we go from first hypothesis to second we can't use Hoeffding's inequality no more unless we select another random sample. For example is sample is finite and I have big hypothesis set I will easily go from one hypothesis to another and find one which is equal to target function on that sample. But as professor said in first lecture I didn't learn anything, I just memorized the data. But in lecture 2 it seems that we do the same and we are saying that inequality should work which is obviously not true for my example.

I know that I may have stated it fuzzily but that's because I don't actually understand the formalism here: what is our probability space, what is P, what are the random variables, - so that's why I can't formulate my question precisely. But I have a feeling that using this same data set somehow violates the probabilistic picture.

So, if anyone could clarify that it would be great, thanks in advance.
#2
04-12-2013, 12:34 AM
 Rahul Sinha Junior Member Join Date: Apr 2013 Posts: 9
Re: How does Hoeffding's inequality work if we have a fixed sample?

Quote:
 Originally Posted by grozhd As far as I understand Hoeffding's inequality tells us about the probability of "big" differences between a random variable value (mean of N independent r.v.'s) and a number (it's expected value). Since it is a probabilistic inequality it means that if we try to do the expirement we will get this "big" difference only with certain limited probability. And "doing the experiment" assumes that we have a probability space X with a probability function P, then some outcome x from X happens, our random variables have some value on this x. But when we are talking about learning we have a fixed data set so it seems to me that we don't observe a new outcome x each time, but we use one fixed which corresponds to our learning data set. And as we go from first hypothesis to second we can't use Hoeffding's inequality no more unless we select another random sample. For example is sample is finite and I have big hypothesis set I will easily go from one hypothesis to another and find one which is equal to target function on that sample. But as professor said in first lecture I didn't learn anything, I just memorized the data. But in lecture 2 it seems that we do the same and we are saying that inequality should work which is obviously not true for my example. I know that I may have stated it fuzzily but that's because I don't actually understand the formalism here: what is our probability space, what is P, what are the random variables, - so that's why I can't formulate my question precisely. But I have a feeling that using this same data set somehow violates the probabilistic picture. So, if anyone could clarify that it would be great, thanks in advance.

I will try and summarize what I know and you are encouraged to correct me If I don't make any sense or if I am wrong.

Let's take the easiest example of coin tossing. Let's assume you know that a coin is unbiased (P(h)=P(t)=.5). You give the coin to me and pay me some money to toss the coin 10K times and diligently record the outcome of each toss. These outcomes form X or your data set. Now, you ask me to report the P(h). Well, I being a stupid labor have no clue. But I have learned that "mu of X" closely follows P(h) due to some mysterious equation called Hoeffding's inequality. So I report this to you which is about .4998. You are happy that your assessment of non-bias was correct and you reward me with more Dollars .
Let us try and decode the experiment: X is not the whole input space but just a fraction of it. card(X) = 10000 and not infinity. Usually, we will neither have the time nor the energy to flip a coin infinite number of times. So we rely on some statistical assumption to play God and make predictions. The assumption, in our case being, that for a large N (10K), the P(h) being far away from mu is small. How small? That's given by our good old Hoeffding's inequality. It is possible however that mu is .05 while P(h)=.5 ! P(.00001) for example is very very small but it is > P(0) and hence "what can go wrong will go wrong" philosophy might apply. But in general it won't! For eg, you can say that it is very rare to find a girl who is 9ft tall but that doesn't mean it can not happen in the lifetime of our species! So if an Alien samples all humans alive today, they can approximately predict the height of any Homo sapiens it is to encounter considerably accurately (but they might miss on some 9ft miss yet to be born in the year 2890 ).
Yes, you did memorize the mu but if N is large, it can not be too bad! That's the key. But if you sampled just 5 Asians, you will be wrong mostly in Europe!

Quote:
 Originally Posted by grozhd But when we are talking about learning we have a fixed data set so it seems to me that we don't observe a new outcome x each time, but we use one fixed which corresponds to our learning data set.
I will think of things differently. x describes each toss of coin. X is 10K x's. So you don't have one x, but 10K of them. The assumption being, each x is iid and derived from P. Let's say the value of MU = .4998. You do another 10K tosses and derive a MU. You do another 10K... You do this infinite number of times. Each iteration of 10K tosses will give you a MU value. In this case, MU can be considered to be a random variable. And this will follow a normal distribution about E(MU) or the Population mean. The inequality gives you a good approximation of the relationship between any such MU = mu (Read the data set obtained with 10K tosses for which you paid me) and E(MU) or P.

Google Sampling distribution of mean! That's the central idea.
#3
04-12-2013, 04:19 PM
 grozhd Junior Member Join Date: Apr 2013 Posts: 4
Re: How does Hoeffding's inequality work if we have a fixed sample?

Quote:
 Originally Posted by Rahul Sinha Let us try and decode the experiment: X is not the whole input space but just a fraction of it. card(X) = 10000 and not infinity. Usually, we will neither have the time nor the energy to flip a coin infinite number of times.
I don't understand why do you suppose that whole input space should be infinite and neither do I understand how this coin tossing relates to learning In particular what is and what is the target function.

Quote:
 Yes, you did memorize the mu but if N is large, it can not be too bad! That's the key
I beleive it's not true. For example no matter how much data you give me I can always come up with a polynomial of degree equal to the number of training examples which is perfect in sample. Suppose you gave me 20 points from linear function and I generate a polynomial of degree 20. I think it will be really bad out of sample.

Also, I found a question similar to mine and posted a reply here.
#4
04-12-2013, 09:58 PM
 Rahul Sinha Junior Member Join Date: Apr 2013 Posts: 9
Re: How does Hoeffding's inequality work if we have a fixed sample?

Quote:
 Originally Posted by grozhd Also, I found a question similar to mine and posted a reply here.
Alright. To avoid cross posting, I will post an edited version of my explanation above in the other thread.

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 08:04 PM.

 Contact Us - LFD Book - Top