biasvariance plot on p67
Hi Prof. AbuMostafa,
As you suggested, I post below the question that I emailed you earlier, in case other people also have similar questions. However, I couldn't seem to insert/upload images properly here (it showed only a link), so I'll just do a textonly question.
Specifically, I’m a little confused about the biasvariance plot at the bottom of page 67. In the plot, the bias appears to be a flat line, i.e. constant, independent of the sample (training set) size, N. I wondered if this is (approx.) true in general, so I did some experiments (simulations). What I found was that while this was indeed approximately true for the linear regression; it didn’t appear so true when I used the 1nearestneighbor (1NN) algorithm. (Similar to Example 2.8, I tried to learn a sinusoid.)
More specifically, for the linear regression, the averaged learned hypothesis, i.e. "g bar", stays almost unchanged when the size of the training set (N) increases from 4 to 10 in my simulation. Even for N=2, "g bar" doesn’t deviate too much.
However, for the 1NearestNeighbor (1NN) algorithm, "g bar" changes considerably as N grows from 2 to 4, and to 10. This seems reasonable to me though, because as N increases, the distance between a test point (x) and its nearest neighbor decreases, with high probability. So it’s natural to expect "g bar" to converge to the sinusoid, and the bias to decrease as N increases.
Here's the simulated average (squared) bias when N was 2, 4, and 8:
OLS: 0.205, 0.199, 0.198
1NN: 0.184, 0.052, 0.013
where OLS stands for ordinary least squares linear regression.
Do these results and interpretations look correct to you? Or am I mistaken somewhere? I’d greatly appreciate it, if you’d clarify this a little bit more for me. Thanks a lot!
BTW, in my simulation, the training set of size N is sampled independently and uniformly on the [0,1] interval. I then averaged the learned hypotheses from 5000 training sets to obtain each "g bar".
Best regards,
Steve
