Thanks Elroch for your detailed reply ( and your patience therein ). That helped.
[ Just one clarification to my first set of Qs. Let's say that we always have 'sufficient' datapoints to learn from, for any choice of the 'order of polynomial' in the hypothesis set  i.e for H2 we have >>20 and for H10, we have >> 100 points, and likewise for any other order ]
Quote:
Originally Posted by Elroch
1. I think it is fair to say deterministic noise or bias can lead to overfitting as well. For example, suppose you try to model sine functions
on
with a hypothesis set made up of positive constant functions only
on
This is such a bad hypothesis set for the job that however many data points you use, and however much regularization you use, you'd be better off in general using the single hypothesis consisting of the zero function. I would say this is a clear case of overfitting of the bias.

In here and in your last para, you seem to suggest that bias is one form of overfitting. This is where I'm struggling. For instance, in your above example, a constant hypothesis sounds more like underfitting than overfitting. So it definitely has bias in that sense, but is that inability(bias) really overfitting ?
Quote:
Originally Posted by Elroch
Say, for example, all possible 10th order polynomials on a unit interval are possibilities for some unknown function. Suppose however, that anything that is very far from a quadratic is very unlikely, and increasingly unlikely as the coefficients get bigger (excuse my vagueness, but the idea that the actual function is a 10th order polynomial, but it is extremely unlikely to be much different from a quadratic).

It is indeed interesting to think about probabilitydistribution on the order of the polynomial (targetfunction), wrt the order of hypothesispolynomial.
However, if we had a probabilitydistribution on targetfunction's complexity, then a given instance of it will still be a fixedorder polynomial, albeit we may not know what it is. So we will use validationset to gauge which order of polynomial on hypothesis seems more promising. Right?
Quote:
Originally Posted by Elroch
For example, fitting a 10th order polynomial with a 2nd order polynomial hypothesis (without regularization) may easily be overfitting if the data provided is only 3 points.

Ok. Now we augment it with sufficient data points. Given that, the H2 can never do as good as H10 in approximating a 10th order polynomial (target function). So H2 clearly has higher bias than H10, but is that inability an underfitting or an overfitting issue. Apologies for repetition of the Q.