Quote:
Originally Posted by sasin324
Based on the book page 124125
"On a finite data set, the algorithm inadvertently uses some of the degree of freedom to fit the noise, which can result in overfitting and a spurious final hypothesis."
I have some questions based on this sentence:
1. What is spurious hypothesis? How can we identify the spurious hypothesis?
2. Is there any relationship between overfitting phenomenon and the spurious hypothesis?
3. Does spurious hypothesis come from the impact of deterministic noise in data set?
I got stuck for a while to define spurious hypothesis and how to identify it from the model.
Best Regards,

The expression "spurious final hypothesis" is informal. When you fit the noise in sample, whether it is stochastic or deterministic, this takes you away from the desired hypothesis out of sample, since the 'extrapolation' of noise has nothing to do with the desired hypothesis. What you end up with is a spurious (not genuine or authentic) hypothesis.
This is indeed an overfitting phenomenon since fitting the noise is what overfitting is about. Validation can identify overfitting by detecting that the error is getting worse out of sample while we are having a better fit in sample.