Quote:
Originally Posted by Haowen
I have a question regarding the value of M in the multiple-bins Hoeffding bound slides.
M is supposed to be the number of different alternate hypotheses considered by the learning algorithm.
At the same time, H is the space of possible hypotheses that can be considered by the algorithm (e.g., all linear functions, etc).
I keep going back and forth in my mind about whether M=|H|.
Specifically, suppose that for a SPECIFIC training set X, after looking at the data points in X, the algorithm only explored some subset of H, say G with |G| < |H|.
Would it then be correct to set M = |G| and say that for the specific training set X, the probability of the hypothesis being bad is at most 2|G|*the hoeffding bound ? Or would this be incorrect since the theorem only deals with the behavior of the system over all possible X with the distribution P.
Thanks!
|
You raise interesting points. First, indeed

. Second, if the algorithm does not fully explore the hypothesis set

, then

is still a working upper bound as far as generalization from in-sample to out-of-sample is concerned. Third, the analysis fixes

before the data set

is presented, and is done independently of the probability distribution

, i.e., the same bound applies regardless of which

is the true distribution.
In some cases, we can find a better (read: smaller) upper bound, such as in regularization which will be studied later in the course.