View Single Post
Old 01-14-2013, 07:55 PM
yaser's Avatar
yaser yaser is offline
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,478
Default Re: M=|H|? (Lecture 2 slide 16-17)

Originally Posted by Haowen View Post
I have a question regarding the value of M in the multiple-bins Hoeffding bound slides.

M is supposed to be the number of different alternate hypotheses considered by the learning algorithm.

At the same time, H is the space of possible hypotheses that can be considered by the algorithm (e.g., all linear functions, etc).

I keep going back and forth in my mind about whether M=|H|.

Specifically, suppose that for a SPECIFIC training set X, after looking at the data points in X, the algorithm only explored some subset of H, say G with |G| < |H|.

Would it then be correct to set M = |G| and say that for the specific training set X, the probability of the hypothesis being bad is at most 2|G|*the hoeffding bound ? Or would this be incorrect since the theorem only deals with the behavior of the system over all possible X with the distribution P.

You raise interesting points. First, indeed M=|{\cal H}|. Second, if the algorithm does not fully explore the hypothesis set {\cal H}, then M is still a working upper bound as far as generalization from in-sample to out-of-sample is concerned. Third, the analysis fixes {\cal H} before the data set {\cal D} is presented, and is done independently of the probability distribution P, i.e., the same bound applies regardless of which P is the true distribution.

In some cases, we can find a better (read: smaller) upper bound, such as in regularization which will be studied later in the course.
Where everyone thinks alike, no one thinks very much
Reply With Quote