LFD Book Forum overfitting and spurious final hypothesis
 User Name Remember Me? Password
 Register FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#1
05-19-2014, 10:20 PM
 sasin324 Junior Member Join Date: May 2014 Posts: 2
overfitting and spurious final hypothesis

Based on the book page 124-125
"On a finite data set, the algorithm inadvertently uses some of the degree of freedom to fit the noise, which can result in overfitting and a spurious final hypothesis."
I have some questions based on this sentence:
1. What is spurious hypothesis? How can we identify the spurious hypothesis?
2. Is there any relationship between overfitting phenomenon and the spurious hypothesis?
3. Does spurious hypothesis come from the impact of deterministic noise in data set?

I got stuck for a while to define spurious hypothesis and how to identify it from the model.

Best Regards,
#2
05-20-2014, 12:34 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478
Re: overfitting and spurious final hypothesis

Quote:
 Originally Posted by sasin324 Based on the book page 124-125 "On a finite data set, the algorithm inadvertently uses some of the degree of freedom to fit the noise, which can result in overfitting and a spurious final hypothesis." I have some questions based on this sentence: 1. What is spurious hypothesis? How can we identify the spurious hypothesis? 2. Is there any relationship between overfitting phenomenon and the spurious hypothesis? 3. Does spurious hypothesis come from the impact of deterministic noise in data set? I got stuck for a while to define spurious hypothesis and how to identify it from the model. Best Regards,
The expression "spurious final hypothesis" is informal. When you fit the noise in sample, whether it is stochastic or deterministic, this takes you away from the desired hypothesis out of sample, since the 'extrapolation' of noise has nothing to do with the desired hypothesis. What you end up with is a spurious (not genuine or authentic) hypothesis.

This is indeed an overfitting phenomenon since fitting the noise is what overfitting is about. Validation can identify overfitting by detecting that the error is getting worse out of sample while we are having a better fit in sample.
__________________
Where everyone thinks alike, no one thinks very much
#3
05-20-2014, 02:57 PM
 sasin324 Junior Member Join Date: May 2014 Posts: 2
Re: overfitting and spurious final hypothesis

Thanks for your response. This is very clear answer for my questions.
However, I still have some confusing about overfitting and the noise.

Suppose I fit the noise in the sample, Does this noise always introduce additional parameters into my model, i.e. the model have unnecessary parameters to overfit the sample?

Is it possible that an additional parameter in a model comes from a spurious relationship (between parameters) that appears only in a sample by chance, e.g. people who born in December have more chance to have cancer, but doesn't appear in out-of-sample data can lead to overfitting phenomenon?

Could feature selection help mitigate overfitting problem?

Best Regards
#4
06-19-2014, 06:37 AM
 magdon RPI Join Date: Aug 2009 Location: Troy, NY, USA. Posts: 597
Re: overfitting and spurious final hypothesis

The number of parameters in your model (to describe a hypothesis) is fixed before you see the data. A more complex model with many parameters increases your ability to fit the noise (usually more so than your ability to fit the true information in the data). This leads to the overfitting.

One effect of feature selection is to reduce the number of parameters which usually helps with overfitting.

Quote:
 Originally Posted by sasin324 Thanks for your response. This is very clear answer for my questions. However, I still have some confusing about overfitting and the noise. Suppose I fit the noise in the sample, Does this noise always introduce additional parameters into my model, i.e. the model have unnecessary parameters to overfit the sample? Is it possible that an additional parameter in a model comes from a spurious relationship (between parameters) that appears only in a sample by chance, e.g. people who born in December have more chance to have cancer, but doesn't appear in out-of-sample data can lead to overfitting phenomenon? Could feature selection help mitigate overfitting problem? Best Regards
__________________
Have faith in probability

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 04:28 AM.

 Contact Us - LFD Book - Top

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.