Quote:
Originally Posted by ladybird2012
Hi,
I would like to request a little clarification on the VC bound from Lecture 6. For the slide entitled "What to do about Eout?", I understand that the Ein and Ein' both track Eout, although more loosely than Ein on just one sample. But I don't understand why having the Ein on two samples (Ein and Ein') allows us to characterize them in terms of dichotomies. What is so special about 2 samples (why not 1 or 3?)? Is this something that becomes clear in the proof (which I haven't looked at yet) or is this something that can be understood conceptually?
Sorry I wasn't able to ask this question in the Q&A but it is not practical for me to follow the lectures live.
Thanks a lot.

The two samples are a technical trick that allows us to consider the dichotomies on a finite sample of data points (
of them), while still capturing the fact that with multiple hypotheses, the Hoeffdingtype bounds become looser and looser. This gives us a concrete way of accounting for the overlaps in terms of a combinatorial quantity rather than full probabilistic analysis of dependence between events.
Having said that, what I did in the lecture was only a sketch of the proof to underline the main ideas in the formal proof. To pin it down completely, there is really no alternative to going through the formal proof which appears in the Appendix. It is not that hard, but certainly not trivial.