View Single Post
Old 06-19-2014, 07:37 AM
magdon's Avatar
magdon magdon is offline
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 597
Default Re: overfitting and spurious final hypothesis

The number of parameters in your model (to describe a hypothesis) is fixed before you see the data. A more complex model with many parameters increases your ability to fit the noise (usually more so than your ability to fit the true information in the data). This leads to the overfitting.

One effect of feature selection is to reduce the number of parameters which usually helps with overfitting.

Originally Posted by sasin324 View Post
Thanks for your response. This is very clear answer for my questions.
However, I still have some confusing about overfitting and the noise.

Suppose I fit the noise in the sample, Does this noise always introduce additional parameters into my model, i.e. the model have unnecessary parameters to overfit the sample?

Is it possible that an additional parameter in a model comes from a spurious relationship (between parameters) that appears only in a sample by chance, e.g. people who born in December have more chance to have cancer, but doesn't appear in out-of-sample data can lead to overfitting phenomenon?

Could feature selection help mitigate overfitting problem?

Best Regards
Have faith in probability
Reply With Quote