LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 4 - Overfitting

Reply
 
Thread Tools Display Modes
  #1  
Old 08-22-2012, 01:16 PM
tadworthington tadworthington is offline
Member
 
Join Date: Jun 2012
Location: Chicago, IL
Posts: 32
Default What happens when error cannot be computed (is infinite) with leave-one-out CV?

I thought of this while working on the homework for the class. Let's say I have three points: (-1,0), (1,0), and (1,1). I want to use a linear model (h(x) = mx + b) to do the fitting, and I use LOO to check my cross validation error. The problem becomes apparent right away:

Code:
Leave out (-1,0), and fit (1,0), (1,1).  Fitting gives a vertical line, x = 1.
Of course, I am now unable to compute the squared error for the point (-1,0) that was left out - the error will be infinite.

Is the solution that I can't choose a vertical line (x = k, for some k) when fitting the data?
Reply With Quote
  #2  
Old 08-22-2012, 01:18 PM
tadworthington tadworthington is offline
Member
 
Join Date: Jun 2012
Location: Chicago, IL
Posts: 32
Default Re: What happens when error cannot be computed (is infinite) with leave-one-out CV?

I think my question didn't make sense. Of course I can't get a vertical line when producing a hypothesis of the form h(x) = mx + b.

Too bad there is no delete on the forum
Reply With Quote
  #3  
Old 08-22-2012, 04:57 PM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: What happens when error cannot be computed (is infinite) with leave-one-out CV?

As posed, the LOO error is indeed not defined (infinite) however, you question is interesting when your last data point is (say) (1+\epsilon,1).

By choosing \epsilon appropriately small, you can make the LOO error arbitrarily large.

However, there is no problem with that; remember that your LOO error is an estimate of your E_{out} when learning with N-1 points. If your distribution can generate the two points (1+\epsilon,1) and (1,0) with high probability (which is verified by the very existence of this data set) then indeed, the out-of-sample error you should expect when learning from 2 data points is very large.

Quote:
Originally Posted by tadworthington View Post
I thought of this while working on the homework for the class. Let's say I have three points: (-1,0), (1,0), and (1,1). I want to use a linear model (h(x) = mx + b) to do the fitting, and I use LOO to check my cross validation error. The problem becomes apparent right away:

Code:
Leave out (-1,0), and fit (1,0), (1,1).  Fitting gives a vertical line, x = 1.
Of course, I am now unable to compute the squared error for the point (-1,0) that was left out - the error will be infinite.

Is the solution that I can't choose a vertical line (x = k, for some k) when fitting the data?
__________________
Have faith in probability
Reply With Quote
Reply

Tags
cross-validation

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 09:05 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.