![]() |
|
#1
|
|||
|
|||
![]()
Hi Prof. Abu-Mostafa,
As you suggested, I post below the question that I emailed you earlier, in case other people also have similar questions. However, I couldn't seem to insert/upload images properly here (it showed only a link), so I'll just do a text-only question. Specifically, I’m a little confused about the bias-variance plot at the bottom of page 67. In the plot, the bias appears to be a flat line, i.e. constant, independent of the sample (training set) size, N. I wondered if this is (approx.) true in general, so I did some experiments (simulations). What I found was that while this was indeed approximately true for the linear regression; it didn’t appear so true when I used the 1-nearest-neighbor (1-NN) algorithm. (Similar to Example 2.8, I tried to learn a sinusoid.) More specifically, for the linear regression, the averaged learned hypothesis, i.e. "g bar", stays almost unchanged when the size of the training set (N) increases from 4 to 10 in my simulation. Even for N=2, "g bar" doesn’t deviate too much. However, for the 1-Nearest-Neighbor (1-NN) algorithm, "g bar" changes considerably as N grows from 2 to 4, and to 10. This seems reasonable to me though, because as N increases, the distance between a test point (x) and its nearest neighbor decreases, with high probability. So it’s natural to expect "g bar" to converge to the sinusoid, and the bias to decrease as N increases. Here's the simulated average (squared) bias when N was 2, 4, and 8: OLS: 0.205, 0.199, 0.198 1NN: 0.184, 0.052, 0.013 where OLS stands for ordinary least squares linear regression. Do these results and interpretations look correct to you? Or am I mistaken somewhere? I’d greatly appreciate it, if you’d clarify this a little bit more for me. Thanks a lot! BTW, in my simulation, the training set of size N is sampled independently and uniformly on the [0,1] interval. I then averaged the learned hypotheses from 5000 training sets to obtain each "g bar". Best regards, Steve |
#2
|
||||
|
||||
![]()
Your observations are correct. The bias is only approximately constant. Only for a linear model and linear target is the bias constant. In general, the bias converges very quickly to a constant. This is because there is some "best"
![]() ![]() ![]() ![]() ![]() ![]() The above discussion does not hold for nonparametric models like Nearest Neighbor which do not fit the paradigm of a fixed hypothesis set. When N increases, the "hypothesis set" gets more "complex" and so the bias decreases with N (as your very nice experiment verifies). I congratulate you on delving deeper into the bias-variance decomposition and discovering this subtle phenomenon. If you would like to know more about this, you may refer to the section on Parametric versus Nonparametric models in e-Chapter 6 and also the discussion of the self-regularizing property of Nearest Neighbor just before section 6.2.2 where we show some pictures to illustrate how the Nearest Neighbor hypothesis gets "more complicated" as you increase N.
__________________
Have faith in probability |
#3
|
|||
|
|||
![]() Quote:
Just one more question: in the quote above, when you said "there's some best ![]() ![]() ![]() ![]() ![]() |
#4
|
||||
|
||||
![]() Quote:
![]() ![]() ![]() ![]() ![]() ![]()
__________________
Have faith in probability |
![]() |
Thread Tools | |
Display Modes | |
|
|