View Single Post
Old 05-11-2013, 04:18 PM
sptripathi sptripathi is offline
Junior Member
Join Date: Apr 2013
Posts: 8
Default Re: Lec-11: Overfitting in terms of (bias, var, stochastic-noise)

Thanks Elroch for your detailed reply ( and your patience therein ). That helped.

[ Just one clarification to my first set of Qs. Let's say that we always have 'sufficient' data-points to learn from, for any choice of the 'order of polynomial' in the hypothesis set - i.e for H2 we have >>20 and for H10, we have >> 100 points, and likewise for any other order ]

Originally Posted by Elroch View Post
1. I think it is fair to say deterministic noise or bias can lead to overfitting as well. For example, suppose you try to model sine functions

a sin(\pi x) on [-1,1]

with a hypothesis set made up of positive constant functions only

\{\mathcal H(x)=k | k>0 \} on [-1,1]

This is such a bad hypothesis set for the job that however many data points you use, and however much regularization you use, you'd be better off in general using the single hypothesis consisting of the zero function. I would say this is a clear case of overfitting of the bias.
In here and in your last para, you seem to suggest that bias is one form of overfitting. This is where I'm struggling. For instance, in your above example, a constant hypothesis sounds more like underfitting than overfitting. So it definitely has bias in that sense, but is that inability(bias) really overfitting ?

Originally Posted by Elroch View Post
Say, for example, all possible 10th order polynomials on a unit interval are possibilities for some unknown function. Suppose however, that anything that is very far from a quadratic is very unlikely, and increasingly unlikely as the coefficients get bigger (excuse my vagueness, but the idea that the actual function is a 10th order polynomial, but it is extremely unlikely to be much different from a quadratic).
It is indeed interesting to think about probability-distribution on the order of the polynomial (target-function), wrt the order of hypothesis-polynomial.
However, if we had a probability-distribution on target-function's complexity, then a given instance of it will still be a fixed-order polynomial, albeit we may not know what it is. So we will use validation-set to gauge which order of polynomial on hypothesis seems more promising. Right?

Originally Posted by Elroch View Post
For example, fitting a 10th order polynomial with a 2nd order polynomial hypothesis (without regularization) may easily be overfitting if the data provided is only 3 points.
Ok. Now we augment it with sufficient data points. Given that, the H2 can never do as good as H10 in approximating a 10th order polynomial (target function). So H2 clearly has higher bias than H10, but is that inability an underfitting or an overfitting issue. Apologies for repetition of the Q.
Reply With Quote