LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 2 - Training versus Testing (http://book.caltech.edu/bookforum/forumdisplay.php?f=109)
-   -   bias-variance plot on p67 (http://book.caltech.edu/bookforum/showthread.php?t=4767)

Steve_Y 05-24-2017 10:33 PM

bias-variance plot on p67
 
Hi Prof. Abu-Mostafa,

As you suggested, I post below the question that I emailed you earlier, in case other people also have similar questions. However, I couldn't seem to insert/upload images properly here (it showed only a link), so I'll just do a text-only question.

Specifically, Iím a little confused about the bias-variance plot at the bottom of page 67. In the plot, the bias appears to be a flat line, i.e. constant, independent of the sample (training set) size, N. I wondered if this is (approx.) true in general, so I did some experiments (simulations). What I found was that while this was indeed approximately true for the linear regression; it didnít appear so true when I used the 1-nearest-neighbor (1-NN) algorithm. (Similar to Example 2.8, I tried to learn a sinusoid.)

More specifically, for the linear regression, the averaged learned hypothesis, i.e. "g bar", stays almost unchanged when the size of the training set (N) increases from 4 to 10 in my simulation. Even for N=2, "g bar" doesnít deviate too much.

However, for the 1-Nearest-Neighbor (1-NN) algorithm, "g bar" changes considerably as N grows from 2 to 4, and to 10. This seems reasonable to me though, because as N increases, the distance between a test point (x) and its nearest neighbor decreases, with high probability. So itís natural to expect "g bar" to converge to the sinusoid, and the bias to decrease as N increases.

Here's the simulated average (squared) bias when N was 2, 4, and 8:
OLS: 0.205, 0.199, 0.198
1NN: 0.184, 0.052, 0.013
where OLS stands for ordinary least squares linear regression.

Do these results and interpretations look correct to you? Or am I mistaken somewhere? Iíd greatly appreciate it, if youíd clarify this a little bit more for me. Thanks a lot!

BTW, in my simulation, the training set of size N is sampled independently and uniformly on the [0,1] interval. I then averaged the learned hypotheses from 5000 training sets to obtain each "g bar".

Best regards,
Steve

magdon 05-25-2017 05:07 AM

Re: bias-variance plot on p67
 
Your observations are correct. The bias is only approximately constant. Only for a linear model and linear target is the bias constant. In general, the bias converges very quickly to a constant. This is because there is some "best" h^* and for any N, the final output g will be "scattered" around this h^*, sometimes predicting above h^* on a particular x and sometimes below, on average giving the prediction of h^*. This results in \bar g being approximately h^* for any N.

The above discussion does not hold for nonparametric models like Nearest Neighbor which do not fit the paradigm of a fixed hypothesis set. When N increases, the "hypothesis set" gets more "complex" and so the bias decreases with N (as your very nice experiment verifies). I congratulate you on delving deeper into the bias-variance decomposition and discovering this subtle phenomenon. If you would like to know more about this, you may refer to the section on Parametric versus Nonparametric models in e-Chapter 6 and also the discussion of the self-regularizing property of Nearest Neighbor just before section 6.2.2 where we show some pictures to illustrate how the Nearest Neighbor hypothesis gets "more complicated" as you increase N.

Steve_Y 05-25-2017 10:20 PM

Re: bias-variance plot on p67
 
Quote:

Originally Posted by magdon (Post 12666)
This is because there is some "best" h^* and for any N, the final output g will be "scattered" around this h^*, sometimes predicting above h^* on a particular x and sometimes below, on average giving the prediction of h^*. This results in \bar g being approximately h^* for any N.

Thank you very much, Prof. Magdon-Ismail, for the clarification, pointers, and encouragement!

Just one more question: in the quote above, when you said "there's some best h^*", did you mean the best h^* in current hypothesis set \cal H for the current error measure, independent of N? For example, if \cal H consists of linear models and the error measure is mean squared error, then h^* would be the LMMSE estimate? Thanks a lot!

magdon 05-30-2017 05:46 AM

Re: bias-variance plot on p67
 
Quote:

Originally Posted by Steve_Y (Post 12668)
Thank you very much, Prof. Magdon-Ismail, for the clarification, pointers, and encouragement!

Just one more question: in the quote above, when you said "there's some best h^*", did you mean the best h^* in current hypothesis set \cal H for the current error measure, independent of N? For example, if \cal H consists of linear models and the error measure is mean squared error, then h^* would be the LMMSE estimate? Thanks a lot!

Yes, best h^* in the current hypothesis set \cal H (and \cal H should be fixed as you increase N). Note, bias and var are only defined for the MSE error measure, and yes h^* would be the least MSE estimate that is available within \cal H (so h^* is independent of N)?


All times are GMT -7. The time now is 07:12 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.