Quote:
Originally Posted by magdon
Perhaps I am missing some subtlety but this is exactly what is being said in Section 1.3.
Hoeffding says that the probability of an unreliable data set is (say) and so you can "safely" assume the data set was reliable and trust what Ein says.

Hoeffding will tell us the probability that a data set is unreliable, and Section 1.3 does present Hoeffding. But beyond that, yes, you're missing a lot, Malik ;).
 Section 1.3 contains no discussion of how small the Hoeffding should be in order to make meaningful learning claims.
To the contrary, the textbook seems content to use relatively large values for (see Problem 2.1, for instance, which assumes ). But if we want to make meaningful targetassumptionfree learning claims, will not work. The book should make this clear; it does not.
 Section 1.3 makes no mention of "safely" assuming that the data set is reliable.
To the contrary, rather than making concrete assumptions about the data set, Section 1.3 makes probabilistic claims such as (p. 20) "knowing that we are within of most of the time is a significant improvement over not knowing anything at all." But if we want to make meaningful targetassumptionfree learning claims, knowing that the data set is reliable "most of the time" is not enough. We need to be certain that the data set is not misleading, analogous to being certain that our computer hardware has not produced a misleading output. The book should make this clear; it does not.
 Section 1.3 fails to mention that deciding to assume that the data set is reliable is not a decision based in probability theory.
To the contrary, Section 1.3 claims that "[b]y adopting the probabilistic view ['alone' is implied], we get a positive answer to the feasibility question." But if we want to make meaningful targetassumptionfree learning claims, probability theory alone will not give us a positive answer to the feasibility question, because probability theory alone will give us Wolpert's No Free Lunch theorems that apparently imply that target assumptions must be made. No, in addition to Hoeffding we need something that is not justified by probability theory per se in order to justify learning. But making such a nonprobabilityjustified assumption is controversial because of the resulting conflict with conclusions drawn from Bayesian (probabilistic) decision theory. The book should make this clear; it does not.
In short, while I'm glad that it appears that you agree with my argument, Malik, it's my argument, not the book's. The argument found in Section 1.3 attempts to use Hoeffding and assumes that an algorithm will output a hypothesis only if the error on the training set is small, so in those ways it is similar to my argument. But, as you've already agreed in your first post in this thread, the argument of Section 1.3 is fundamentally flawed. You're not saying that about my argument. Ergo, the arguments are different.