Originally Posted by ilya239
Thanks for the explanation.
In HW4 #4 the average hypothesis is measurably shifted from the hypothesis set member giving the lowest mean squared error. Probably because twopoint dataset is too small, i.e. this is not representative of realistic cases?

Well, it is also that the two point data set is small relative to the two parameter hypotheses. If you have 100 points, and 99th degree polynomials, it would also have large variance. I will guess that minimizing bias plus variance happens with the number of fit parameters near the square root of the number of points per data set.