Quote:
Originally Posted by a.sanyal902
There was a previous thread ( here) which discussed this, but I still had a nagging doubt.
Say the complexity of our hypothesis set matches that of the target function (or the set includes the target function). So, there is no deterministic noise. Moreover, let us assume there is no stochastic noise either.
However, due to a finite data set, we may still not be able to generalize very well. Is this still called overfitting? We referred to overfitting when the algorthm tries to select a hypothesis which fits the "noise", stochastic or deterministic. But there is no noise in the example above. We may call it variance, because we have many possible choices and few data points, but are we "overfitting" ?

Firstly, you can have deterministic noise even if the exact target function is in a hypothesis set. The definition is based on the difference between an average hypothesis and the target function. This average hypothesis is not definable in terms of the hypothesis set: it requires, a set
of samples, a probability distribution
on
and an algorithm
which associates a hypothesis with each element of
. Then it is defined as the function each of whose values is the average over
with respect to
of the hypotheses
generates.
Even if there is no deterministic noise, this certainly doesn't preclude the possibility of overfitting: this merely means by comparison with some other machine
,
gives lower in sample error, but greater out of sample error.