Quote:
Originally Posted by vsthakur
Hi,
If I got it right, in a noiseless setting, for a fixed D, if all f are equally likely, the expected off-training-error of any hypothesis h is 0.5 (part d of problem 1.10, page 37) and hence any two algorithms are the same in terms of expected off training error (part e of the same problem).
My question is, does this not contradict the generalization by Hoeffding. Specifically, the following point is bothering me
By Hoeffding : Ein approaches Eout for larger number of hypothesis (i.e for small epsilon) as N grows sufficiently large. Which would imply that expected(Eout) should be approximately the same as expected(Ein) and not a constant (0.5).
Can you please provide some insight on this, perhaps my comparison is erroneous.
|
This is an important question, and I thank you for asking it. There is a subtle point that creates this impression of contradiction.
On face value, the statement "all

are equally likely" sounds reasonable. Mathematically, it corresponds to trying to learn a randomly generated target function, and getting the average performance of this learning process. It should not be a surprise that we would get 0.5 under these circumstances, since almost all random functions are impossible to learn.
In terms of

and

, Hoeffding inequality certainly holds for
each of these random target functions, but

itself will be close to 0.5 for almost all of these functions since they have no pattern to fit, so Hoeffding would indeed predict

to be close to 0.5 on average.
This is why learning was decomposed into two separate questions in this chapter. In terms of these two questions, the one that "fails" in the random function approach is "

?"
Let me finally comment that treating "all

are equally likely" as a plausible statement is a common trap. This issue is addressed in detail in the context of Bayesian learning in the following video segment:
http://work.caltech.edu/library/182.html