 LFD Book Forum Figure 4.3(b)

#1
 physicsme Junior Member Join Date: Oct 2014 Posts: 2 Figure 4.3(b)

The title of Figure 4.3(b) is "Deterministic noise". However, the "overfitting level first decrease, hit sweet spot, then increase" trend with increase of Qf is the result of not only the deterministic noise, but of the stochastic noise as well. The reason that the fixed hypothesis set overfits the data when the target function is extremely simple is the existence of stochastic noise. If we eliminate that, a 10th order polynomial hypothesis set will fit a 2nd order polynomial target function exactly as it is.
In this sense, the title "Deterministic noise" of figure 4.3(b) is a bit misleading.

Actually, I became aware of this while doing exercise 4.3. At first I thought the answer should be "deterministic noise will go up all the way with increase of target complexity", then I looked at Fig. 4.3(b) and thought "hey, it says deterministic noise will first go down and then go up!". But come to think of it, the figure is really Eout(H10)-Eout(H2), which include effect of both stochastic and deterministic noise, hence the post.
#2
 ypeels Member Join Date: Dec 2014 Posts: 17 Re: Figure 4.3(b)

An explanation by @yaser was posted on the 2014 edX forum:

Quote:
 Q. Why is the bottom part of this figure behaving differently? A. This is an artifact and it has to do with our choice of the models (2nd and 10th order polynomials). For Qf≤10 there is no deterministic noise for H10 (we can perfectly fit them). Q. Why did we add stochastic noise to the target when generating the above figure; arent we just analyzing deterministic noise? A. We wanted to compare the two figures fairly. When plotting the impact of the stochastic noise, we already had some built-in deterministic noise in our target function as well.
.
#3
 ypeels Member Join Date: Dec 2014 Posts: 17 Re: Figure 4.3(b)

Another question about this figure: is there any intuition as to why regions of a given color (fixed overfit measure) are roughly linear in the stochastic N-sigma^2 graph, but non-linear (exponential?) in the deterministic N-Qf graph (for Qf > 10)?

Thank you!
#4 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477 Re: Figure 4.3(b)

Quote:
 Originally Posted by ypeels Another question about this figure: is there any intuition as to why regions of a given color (fixed overfit measure) are roughly linear in the stochastic N-sigma^2 graph, but non-linear (exponential?) in the deterministic N-Qf graph (for Qf > 10)? Thank you!
The analysis of the stochastic noise figure may be doable given the clean analytic components of the simulation (Legendre, pseudo-inverse, Gaussian noise). In the deterministic noise figure, the noise value is quantified by the complexity of the target . While deterministic noise (the part of that cannot be captured by ) is indeed related to , it is not necessarily linearly related to it so that direct parallel with what happens with stochastic noise does not hold.

BTW, the LaTeX stuff is done by delimiters [ math ] and [ /math ] (without the spaces) instead of $and$. A bit cumbersome when you use a lot of math.
__________________
Where everyone thinks alike, no one thinks very much

 Tags figure, noise, suggestion Thread Tools Show Printable Version Email this Page Display Modes Linear Mode Switch to Hybrid Mode Switch to Threaded Mode Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 05:32 AM. The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.