View Single Post
Old 05-17-2012, 10:17 PM
magdon's Avatar
magdon magdon is offline
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 597
Default Re: How does deterministic noise cause overfitting?

The "being led astray" refers to the "noise" in the finite data set leading the learning algorithm in the wrong direction and outputting the wrong final hypothesis (though {\cal{H}},f are fixed). This tendency to be led astray is worse for more complex \cal{H} because it has more flexibility to (over)fit the noise and hence be led astray. This is what contributes most to the var term in the bias variance decomposition. Different data sets (with noise) will lead the learning astray in wildly different directions resulting in high var.

We didn't precisely define deterministic noise, we just gave the intuitive idea. bias is very related to it though not exactly the same. Indeed though \bar g might be worse for smaller N, its dependence on N is mild. See for example Problem 3.14 as an evidence that the bias has only mild dependence on N. In practice, math]\bar g[/math] is close to h^* no matter what N and so the bias is more or less the deterministic noise.

Originally Posted by mic00 View Post
Thanks very much for the detailed reply!

This is a helpful distinction. The idea of being "led astray" has also been nice for intuition.

This makes perfect sense as well, and is how I had been thinking of the major impact of deterministic noise in causing overfitting. What spurred me to think about this is in fact the exercise on page 125, and the hint that, as \cal{H} becomes more complex, there are two factors affecting overfitting. The bias/variance trade-off -- and thus the indirect impact of deterministic noise -- is clear, but that deterministic noise (bias) would directly cause overfitting is a little confusing.

What I am curious about is how we can be "led astray" if \cal{H} and f must stay fixed, and in my mind, I keep coming back to the precise definition of \bar g: if N (size of training data set) is very small, variance will suffer, but also \bar g will differ from the best hypothesis in \cal{H}, leading to higher deterministic noise; if N is big enough, \bar g will match the best hypothesis closely, and both variance and deterministic noise will shrink. So, even in cases of very large deterministic noise, if N is very big and gives us a near-perfect shape of the target function, we are not "led astray" at all (and indeed E_{in} would track E_{out} very well). It seems like that wiggle room in the deterministic noise tracks a bigger change in the variance. Does this make sense?
Have faith in probability
Reply With Quote