View Single Post
Old 04-12-2013, 11:16 PM
yaser's Avatar
yaser yaser is offline
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,478
Default Re: Is the Hoeffding Inequality really valid for each bin despite non-random sampling

Originally Posted by grozhd View Post
I have the same concern as scottedwards2000 and I still don't understand how it is resolved.

As I understand the bin symbolizes the probability space of all possible inputs X. Sample of balls drawn randomly from the bin symbolizes our training set \bar{x_0}=(x_1,...,x_N).

Now we pick a hypothesis h_1 (suppose we are running PLA). We look at our sample \bar{x_0}, compute \nu and use Hoeffding's Inequality. We do one step of PLA and come up with new hypothesis h_2 which automatically gives us \nu_2 and professor is saying that we can write down Hoeffding inequality for \nu_2 and \mu_2?

I guess, we can. But that inequality tells us something about random variable \nu_2: X^n \rightarrow [0;1], i.e. about: P\left( |\nu_2(\bar{x}) - \mu_2 |\right) where \bar{x} is a random sample. But it seems like we are using \nu_2(\bar{x_0}) where \bar{x_0} is hardly random with regard to h_2 since we built h_2 using that sample.

Here is an example that illustrates my point: say we tried some random h_1, compared it with target function f on our training sample (x_1,...,x_N), wrote down Hoeffding's inequality. Now let's construct h_2 as follows: h_2(x_i) = f(x_i), \ i=1,...,N and h_2(x) = h_1(x), \ x \neq x_i, \ i=1,...,N. Let's write down Hoeffding's ineqaulity for this hypothesis. If we are indeed using \nu_2(\bar{x_0}) then here it would be equal to 1 since h_2 = f on \bar{x_0} and we would have: P\left( |1 - \mu_2| > \varepsilon \right) is small. So somehow we are saying with high probability that h_2 does an excellent job out of sample though we didn't change it much from h_1. This example shouldn't be correct, right? If it isn't how is the one with PLA correct?
It is a subtle point, so let me try to explain it in the terms you outlined. Let us take the sample {\cal D} (what you call \bar{x_0}, just to follow the book notation). Now evaluate \nu for all hypotheses h in your model {\cal H}. We didn't start at one h and moved to another. We just evaluated \nu for all h\in{\cal H}. The question is, does Hoeffding inequality apply to each of these h's by itself? The answer is clearly yes since each of them could be in principle the hypothesis you started with (which you called h_1).

Hoeffding states what the probabilities are before the sample is drawn. When you choose one of these hypotheses because of its small \nu, as in the scenario you point out, the probability that applies now is conditioned on the sample having small \nu. We can try to get conditional version of Hoeffding to deal with the situation, or we can try to get a version of Hoeffding that applies regardless of which h we choose and how we choose it. The latter is what we did using the union bound.

Finally, taking the example you illustrated, any hypothesis you use has to be in {\cal H} (which is decided before the sample is drawn). The one you constructed is not guaranteed to be in {\cal H}. Of course you can guarantee that it is in {\cal H} by taking {\cal H} to be the set of all possible hypotheses, but in this case, M is thoroughly infinite and the multiple-bin Hoeffding does not guarantee anything at all.
Where everyone thinks alike, no one thinks very much
Reply With Quote