Quote:
Originally Posted by scottedwards2000
The multiple bin analogy of picking the best h is a very helpful way of visualizing the situation, and I totally get how the union bound sets an upper limit on the probability of exceeding the error threshold. What I am actually questioning is whether those individual probabilities that compose the union bound are correct. I can see that they are just the individual Hoeffding Inequalities for each h, but is the Hoeffding Inequality really valid for all those h's in spite of the fact that we are NOT taking random samples from each "bin"? We are only picking our marbles (x's) ONCE, and then repicking the same marbles from each bin (yes, the redgreen colors of those marbles can change, based on the specific h, but aren't they the same marbles (x's)?).!

I see your concern. Here is one way to argue about it.
Assume the marbles have numbers to identify the
point they are associated with, in addition to being green and red. Start with one bin, pick the marbles at random and look at the colors in sample and out of sample. It is clear that the Hoeffding inequality holds for this bin since the experiment is that of a single bin, regardless of the numbers on the marbles in sample.
Now if you reuse these numbers to do something else somewhere else, that will not alter the applicability of Hoeffding to the bin we started with, right? You can now view the multiplebin experiment as starting with any given bin, then reusing the numbers you got in sample from that bin for the rest of the bins. Hoeffding still applies to that original given bin, regardless of which one it is. Since that bin is arbitrary, Hoeffding must apply to all of the bins individually.