View Single Post
  #3  
Old 05-17-2012, 04:28 PM
mic00 mic00 is offline
Invited Guest
 
Join Date: Apr 2012
Posts: 49
Default Re: How does deterministic noise cause overfitting?

Thanks very much for the detailed reply!

Quote:
Originally Posted by magdon View Post
Doing stuff that looks good in-sample that leads to disasters out-of-sample is the essence of overfitting. An example of this is trying to choose the regularization parameter. If you pick a lower regularization parameter, then you have lower in-sample error, but it leads to higher out-of-sample error - you picked the \lambda with lower E_{in} but it gave higher E_{out}. We call that overfitting. Underfitting is just the name we give to the opposite process in the context of picking the regularization parameter.
This is a helpful distinction. The idea of being "led astray" has also been nice for intuition.

Quote:
Originally Posted by magdon View Post
To understand what is going on, the Bias Variance decomposition helps (bottom of page 125 in the textbook).

E_{out}=\sigma^2+bias+var

\sigma^2 is the direct impact of the stochastic noise. bias is the direct impact of the deterministic noise. The var term is interesting and is the indirect impact of the noise, through \cal{H}. The var term is mostly controlled by the size of \cal{H} in relation to the number of data points. So getting back to the point, if you make \cal{H} more complex, you will decrease the det. noise (bias) but you will increase the var (its indirect impact).
This makes perfect sense as well, and is how I had been thinking of the major impact of deterministic noise in causing overfitting. What spurred me to think about this is in fact the exercise on page 125, and the hint that, as \cal{H} becomes more complex, there are two factors affecting overfitting. The bias/variance trade-off -- and thus the indirect impact of deterministic noise -- is clear, but that deterministic noise (bias) would directly cause overfitting is a little confusing.

What I am curious about is how we can be "led astray" if \cal{H} and f must stay fixed, and in my mind, I keep coming back to the precise definition of \bar g: if N (size of training data set) is very small, variance will suffer, but also \bar g will differ from the best hypothesis in \cal{H}, leading to higher deterministic noise; if N is big enough, \bar g will match the best hypothesis closely, and both variance and deterministic noise will shrink. So, even in cases of very large deterministic noise, if N is very big and gives us a near-perfect shape of the target function, we are not "led astray" at all (and indeed E_{in} would track E_{out} very well). It seems like that wiggle room in the deterministic noise tracks a bigger change in the variance. Does this make sense?
Reply With Quote