This is a very subtle question!
The most important thing to realize is that in learning,
is
fixed and
is given, and so can be assumed fixed. Now we can ask, what is going on in this learning scenario. Here is what we can say:
i) If there is stochastic noise with 'magnitude'
, then you are in trouble.
ii) If there deterministic noise then you are in trouble.
The stochastic noise can be viewed as one part of the data generation process (eg. measurement errors). The deterministic noise can similarly be viewed as another part of the data generation process, namely
. The deterministic and stochastic noise are fixed. In your analogy, you can increase the stochastic noise by increasing the noise variance and you get into deeper trouble. Similarly, you can increase the deterministic noise by making
more complex and you will get into deeper trouble.
I just need to tell you what 'trouble' means. Well, we actually use another word instead of 'trouble'  overfitting. This means you may be likely to make an inferior choice over the superior choice because the inferior choice has lower insample error. Doing stuff that looks good insample that leads to disasters outofsample is the essence of overfitting. An example of this is trying to choose the regularization parameter. If you pick a lower regularization parameter, then you have lower insample error, but it leads to higher outofsample error  you picked the
with lower
but it gave higher
. We call that overfitting. Underfitting is just the name we give to the opposite process in the context of picking the regularization parameter. Once the regularization parameter gets too high, as you pick a higher
you get both higher
and higher
. It also turns out that this means you over regularized and obtained an oversimplistic
 i.e. you 'underfitted', you didn't fit the data enough. The underfitting and overfitting are just terms. The substance of what is going on under the hood is how the deterministic and stochastic noise are affecting what you should and should not do insample.
Now let's get back to the subtle part of your question. There is actually another way to decrease the deterministic noise  increase the complexity of
(the other way is to decrease the complexity of
which we discussed above). Now is where the difference with stochastic noise pops up. With stochastic noise, it either goes up or down; if down, then things get better. With deterministic noise, if you just tell me that it went down, I need to ask you *how*. Did your target function get simpler  if yes, then great, it is just as if the stochastic noise went down. If it is that your
got more complicated, then things get interesting. To understand what is going on, the Bias Variance decomposition helps (bottom of page 125 in the textbook).
is the direct impact of the stochastic noise. bias is the direct impact of the deterministic noise. The var term is interesting and is the indirect impact of the noise,
through . The var term is mostly controlled by the size of
in relation to the number of data points. So getting back to the point, if you make
more complex, you will decrease the det. noise (bias) but you will increase the var (its indirect impact). Usually the latter dominates (overfitting, not because of the direct impact of the noise, but because of its indirect impact) ... unless you are in the underfitting regime
when the former dominates.
Quote:
Originally Posted by mic00
I am still a little confused about this. It's clear to me that reducing deterministic noise can lead to overfitting (if there is not enough insample data), but the presence of deterministic noise itself seems (to me) to cause underfitting. Am I just being pedantic?
(I contrast this with stochastic noise: it cannot be reduced, and clearly any attempt to fit it is overfitting, because no amount of insample data will clarify its shape).
