LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 2 - Training versus Testing

Reply
 
Thread Tools Display Modes
  #1  
Old 05-24-2017, 10:33 PM
Steve_Y Steve_Y is offline
Junior Member
 
Join Date: May 2017
Posts: 2
Default bias-variance plot on p67

Hi Prof. Abu-Mostafa,

As you suggested, I post below the question that I emailed you earlier, in case other people also have similar questions. However, I couldn't seem to insert/upload images properly here (it showed only a link), so I'll just do a text-only question.

Specifically, Iím a little confused about the bias-variance plot at the bottom of page 67. In the plot, the bias appears to be a flat line, i.e. constant, independent of the sample (training set) size, N. I wondered if this is (approx.) true in general, so I did some experiments (simulations). What I found was that while this was indeed approximately true for the linear regression; it didnít appear so true when I used the 1-nearest-neighbor (1-NN) algorithm. (Similar to Example 2.8, I tried to learn a sinusoid.)

More specifically, for the linear regression, the averaged learned hypothesis, i.e. "g bar", stays almost unchanged when the size of the training set (N) increases from 4 to 10 in my simulation. Even for N=2, "g bar" doesnít deviate too much.

However, for the 1-Nearest-Neighbor (1-NN) algorithm, "g bar" changes considerably as N grows from 2 to 4, and to 10. This seems reasonable to me though, because as N increases, the distance between a test point (x) and its nearest neighbor decreases, with high probability. So itís natural to expect "g bar" to converge to the sinusoid, and the bias to decrease as N increases.

Here's the simulated average (squared) bias when N was 2, 4, and 8:
OLS: 0.205, 0.199, 0.198
1NN: 0.184, 0.052, 0.013
where OLS stands for ordinary least squares linear regression.

Do these results and interpretations look correct to you? Or am I mistaken somewhere? Iíd greatly appreciate it, if youíd clarify this a little bit more for me. Thanks a lot!

BTW, in my simulation, the training set of size N is sampled independently and uniformly on the [0,1] interval. I then averaged the learned hypotheses from 5000 training sets to obtain each "g bar".

Best regards,
Steve
Reply With Quote
  #2  
Old 05-25-2017, 05:07 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 590
Default Re: bias-variance plot on p67

Your observations are correct. The bias is only approximately constant. Only for a linear model and linear target is the bias constant. In general, the bias converges very quickly to a constant. This is because there is some "best" h^* and for any N, the final output g will be "scattered" around this h^*, sometimes predicting above h^* on a particular x and sometimes below, on average giving the prediction of h^*. This results in \bar g being approximately h^* for any N.

The above discussion does not hold for nonparametric models like Nearest Neighbor which do not fit the paradigm of a fixed hypothesis set. When N increases, the "hypothesis set" gets more "complex" and so the bias decreases with N (as your very nice experiment verifies). I congratulate you on delving deeper into the bias-variance decomposition and discovering this subtle phenomenon. If you would like to know more about this, you may refer to the section on Parametric versus Nonparametric models in e-Chapter 6 and also the discussion of the self-regularizing property of Nearest Neighbor just before section 6.2.2 where we show some pictures to illustrate how the Nearest Neighbor hypothesis gets "more complicated" as you increase N.
__________________
Have faith in probability
Reply With Quote
  #3  
Old 05-25-2017, 10:20 PM
Steve_Y Steve_Y is offline
Junior Member
 
Join Date: May 2017
Posts: 2
Default Re: bias-variance plot on p67

Quote:
Originally Posted by magdon View Post
This is because there is some "best" h^* and for any N, the final output g will be "scattered" around this h^*, sometimes predicting above h^* on a particular x and sometimes below, on average giving the prediction of h^*. This results in \bar g being approximately h^* for any N.
Thank you very much, Prof. Magdon-Ismail, for the clarification, pointers, and encouragement!

Just one more question: in the quote above, when you said "there's some best h^*", did you mean the best h^* in current hypothesis set \cal H for the current error measure, independent of N? For example, if \cal H consists of linear models and the error measure is mean squared error, then h^* would be the LMMSE estimate? Thanks a lot!
Reply With Quote
  #4  
Old 05-30-2017, 05:46 AM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 590
Default Re: bias-variance plot on p67

Quote:
Originally Posted by Steve_Y View Post
Thank you very much, Prof. Magdon-Ismail, for the clarification, pointers, and encouragement!

Just one more question: in the quote above, when you said "there's some best h^*", did you mean the best h^* in current hypothesis set \cal H for the current error measure, independent of N? For example, if \cal H consists of linear models and the error measure is mean squared error, then h^* would be the LMMSE estimate? Thanks a lot!
Yes, best h^* in the current hypothesis set \cal H (and \cal H should be fixed as you increase N). Note, bias and var are only defined for the MSE error measure, and yes h^* would be the least MSE estimate that is available within \cal H (so h^* is independent of N)?
__________________
Have faith in probability
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 03:25 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.