LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 1

Reply
 
Thread Tools Display Modes
  #1  
Old 04-11-2013, 05:47 PM
grozhd grozhd is offline
Junior Member
 
Join Date: Apr 2013
Posts: 4
Default How does Hoeffding's inequality work if we have a fixed sample?

As far as I understand Hoeffding's inequality tells us about the probability of "big" differences between a random variable value (mean of N independent r.v.'s) and a number (it's expected value).

Since it is a probabilistic inequality it means that if we try to do the expirement we will get this "big" difference only with certain limited probability. And "doing the experiment" assumes that we have a probability space X with a probability function P, then some outcome x from X happens, our random variables have some value on this x.

But when we are talking about learning we have a fixed data set so it seems to me that we don't observe a new outcome x each time, but we use one fixed which corresponds to our learning data set. And as we go from first hypothesis to second we can't use Hoeffding's inequality no more unless we select another random sample. For example is sample is finite and I have big hypothesis set I will easily go from one hypothesis to another and find one which is equal to target function on that sample. But as professor said in first lecture I didn't learn anything, I just memorized the data. But in lecture 2 it seems that we do the same and we are saying that inequality should work which is obviously not true for my example.

I know that I may have stated it fuzzily but that's because I don't actually understand the formalism here: what is our probability space, what is P, what are the random variables, - so that's why I can't formulate my question precisely. But I have a feeling that using this same data set somehow violates the probabilistic picture.

So, if anyone could clarify that it would be great, thanks in advance.
Reply With Quote
  #2  
Old 04-12-2013, 12:34 AM
Rahul Sinha Rahul Sinha is offline
Junior Member
 
Join Date: Apr 2013
Posts: 9
Thumbs up Re: How does Hoeffding's inequality work if we have a fixed sample?

Quote:
Originally Posted by grozhd View Post
As far as I understand Hoeffding's inequality tells us about the probability of "big" differences between a random variable value (mean of N independent r.v.'s) and a number (it's expected value).

Since it is a probabilistic inequality it means that if we try to do the expirement we will get this "big" difference only with certain limited probability. And "doing the experiment" assumes that we have a probability space X with a probability function P, then some outcome x from X happens, our random variables have some value on this x.

But when we are talking about learning we have a fixed data set so it seems to me that we don't observe a new outcome x each time, but we use one fixed which corresponds to our learning data set. And as we go from first hypothesis to second we can't use Hoeffding's inequality no more unless we select another random sample. For example is sample is finite and I have big hypothesis set I will easily go from one hypothesis to another and find one which is equal to target function on that sample. But as professor said in first lecture I didn't learn anything, I just memorized the data. But in lecture 2 it seems that we do the same and we are saying that inequality should work which is obviously not true for my example.

I know that I may have stated it fuzzily but that's because I don't actually understand the formalism here: what is our probability space, what is P, what are the random variables, - so that's why I can't formulate my question precisely. But I have a feeling that using this same data set somehow violates the probabilistic picture.

So, if anyone could clarify that it would be great, thanks in advance.

I will try and summarize what I know and you are encouraged to correct me If I don't make any sense or if I am wrong.

Let's take the easiest example of coin tossing. Let's assume you know that a coin is unbiased (P(h)=P(t)=.5). You give the coin to me and pay me some money to toss the coin 10K times and diligently record the outcome of each toss. These outcomes form X or your data set. Now, you ask me to report the P(h). Well, I being a stupid labor have no clue. But I have learned that "mu of X" closely follows P(h) due to some mysterious equation called Hoeffding's inequality. So I report this to you which is about .4998. You are happy that your assessment of non-bias was correct and you reward me with more Dollars .
Let us try and decode the experiment: X is not the whole input space but just a fraction of it. card(X) = 10000 and not infinity. Usually, we will neither have the time nor the energy to flip a coin infinite number of times. So we rely on some statistical assumption to play God and make predictions. The assumption, in our case being, that for a large N (10K), the P(h) being far away from mu is small. How small? That's given by our good old Hoeffding's inequality. It is possible however that mu is .05 while P(h)=.5 ! P(.00001) for example is very very small but it is > P(0) and hence "what can go wrong will go wrong" philosophy might apply. But in general it won't! For eg, you can say that it is very rare to find a girl who is 9ft tall but that doesn't mean it can not happen in the lifetime of our species! So if an Alien samples all humans alive today, they can approximately predict the height of any Homo sapiens it is to encounter considerably accurately (but they might miss on some 9ft miss yet to be born in the year 2890 ).
Yes, you did memorize the mu but if N is large, it can not be too bad! That's the key. But if you sampled just 5 Asians, you will be wrong mostly in Europe!

Quote:
Originally Posted by grozhd View Post
But when we are talking about learning we have a fixed data set so it seems to me that we don't observe a new outcome x each time, but we use one fixed which corresponds to our learning data set.
I will think of things differently. x describes each toss of coin. X is 10K x's. So you don't have one x, but 10K of them. The assumption being, each x is iid and derived from P. Let's say the value of MU = .4998. You do another 10K tosses and derive a MU. You do another 10K... You do this infinite number of times. Each iteration of 10K tosses will give you a MU value. In this case, MU can be considered to be a random variable. And this will follow a normal distribution about E(MU) or the Population mean. The inequality gives you a good approximation of the relationship between any such MU = mu (Read the data set obtained with 10K tosses for which you paid me) and E(MU) or P.

Google Sampling distribution of mean! That's the central idea.
Reply With Quote
  #3  
Old 04-12-2013, 04:19 PM
grozhd grozhd is offline
Junior Member
 
Join Date: Apr 2013
Posts: 4
Default Re: How does Hoeffding's inequality work if we have a fixed sample?

Quote:
Originally Posted by Rahul Sinha View Post
Let us try and decode the experiment: X is not the whole input space but just a fraction of it. card(X) = 10000 and not infinity. Usually, we will neither have the time nor the energy to flip a coin infinite number of times.
I don't understand why do you suppose that whole input space should be infinite and neither do I understand how this coin tossing relates to learning In particular what is X and what is the target function.

Quote:
Yes, you did memorize the mu but if N is large, it can not be too bad! That's the key
I beleive it's not true. For example no matter how much data you give me I can always come up with a polynomial of degree equal to the number of training examples which is perfect in sample. Suppose you gave me 20 points (x, y) from linear function and I generate a polynomial of degree 20. I think it will be really bad out of sample.

Also, I found a question similar to mine and posted a reply here.
Reply With Quote
  #4  
Old 04-12-2013, 09:58 PM
Rahul Sinha Rahul Sinha is offline
Junior Member
 
Join Date: Apr 2013
Posts: 9
Default Re: How does Hoeffding's inequality work if we have a fixed sample?

Quote:
Originally Posted by grozhd View Post
Also, I found a question similar to mine and posted a reply here.
Alright. To avoid cross posting, I will post an edited version of my explanation above in the other thread.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 11:09 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.