LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 6

Reply
 
Thread Tools Display Modes
  #1  
Old 05-10-2013, 08:02 AM
jlaurentum jlaurentum is offline
Member
 
Join Date: Apr 2013
Location: Venezuela
Posts: 41
Default Restricted Learner's Rule of Thumb (Lecture 11)

Hello All:

In minute 23:45 of Lecture 11, the restricted learner's reasoning is based on a rule of thumb, whereby you should have 10 data points for every parameter you want to estimate in your model. Where (in the book or in the other lectures) can I find more information on the justification for this rule of thumb?
Reply With Quote
  #2  
Old 05-10-2013, 08:56 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: Restricted Learner's Rule of Thumb (Lecture 11)

Quote:
Originally Posted by jlaurentum View Post
Hello All:

In minute 23:45 of Lecture 11, the restricted learner's reasoning is based on a rule of thumb, whereby you should have 10 data points for every parameter you want to estimate in your model. Where (in the book or in the other lectures) can I find more information on the justification for this rule of thumb?
Let me first point to that part of the lecture using the lecture tag:



The rule of thumb is a practical observation, so its real justification is simply that it has worked most of the time in practice. Once can justify the form that the number of examples is a multiple of the VC dimension by arguing that having multiple data points to fit per degree of freedom will force that degree of freedom to a 'compromise' that is likely to capture what is common between these data points, i.e., likely to generalize. Whether that multiple is 5 or 10 or 100, however, is an empirical observation that is difficult to reason about in a genuine way.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 05-10-2013, 10:20 AM
jlaurentum jlaurentum is offline
Member
 
Join Date: Apr 2013
Location: Venezuela
Posts: 41
Default Re: Restricted Learner's Rule of Thumb (Lecture 11)

Thank you for the answer, Professor. So I understand that it is not possible (even based on your experience) to give a single value for this multiple because a single value cannot cover all possible modeling situations (target complexity, stochastic/deterministic noise, etc.)?
Reply With Quote
  #4  
Old 05-10-2013, 11:36 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,474
Default Re: Restricted Learner's Rule of Thumb (Lecture 11)

Quote:
Originally Posted by jlaurentum View Post
Thank you for the answer, Professor. So I understand that it is not possible (even based on your experience) to give a single value for this multiple because a single value cannot cover all possible modeling situations (target complexity, stochastic/deterministic noise, etc.)?
Correct. It does depend on the situation.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #5  
Old 05-11-2013, 02:23 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Restricted Learner's Rule of Thumb (Lecture 11)

On a specific point of jlaurentum, I have come to the intuitive conclusion that noise effectively reduces the size of the set of data, with the more noise, the more more data points needed to achieve the same results.

This is related to the much simpler idea that if you want to estimate a mean average with noisy (i.e. non-zero variance) data, then the accuracy is inversely proportional to the square root of the number of data points. Likewise, I would conjecture that a noisy machine learning problem might be reduced to a near noiseless one by having a very large number of data points (although the quantitative details of this are less clear).

In machine learning there is the complication that this intuition only applies to genuine (stochastic) noise. As "deterministic noise" is unvarying, it is not reduced (but the variance is). On reflection, I feel the term "deterministic noise" can be a little misleading, as it is a form of error which merely mimics noise to an observer, but lacks one of its properties (randomness). As an analogy with a physical measurement, it is more similar to a calibration error than to an uncertainty in measurement.
Reply With Quote
Reply

Tags
overfitting, rule of thumb

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 09:25 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.