LFD Book Forum  

Go Back   LFD Book Forum > Book Feedback - Learning From Data > Chapter 4 - Overfitting

Reply
 
Thread Tools Display Modes
  #1  
Old 06-16-2015, 06:01 AM
yusunchina yusunchina is offline
Junior Member
 
Join Date: Aug 2014
Posts: 6
Default Exercise 4.10

Would anyone please give me a clue to part (c)? This seems rather counterintuitive to me.

Thanks a lot!
Reply With Quote
  #2  
Old 10-14-2015, 10:12 AM
sayan751 sayan751 is offline
Junior Member
 
Join Date: Jun 2015
Posts: 5
Default Re: Exercise 4.10

I think this can be a possible explanation for Exercise 4.10.(c):

When K is 1, then estimation of out-of sample error by validation error is not ‘that’ good because of the penalty term. Thus, the model chosen from this poor estimate might not be the ‘best’ one. This explains Expectation[Out-of-Sample Error of g^-_(m*)] < Expectation[Out-of-Sample Error of g_(m*)]. This situation somewhat improves as K increases.

Please let me know if this explanation is not correct.
Reply With Quote
  #3  
Old 10-14-2015, 10:18 AM
sayan751 sayan751 is offline
Junior Member
 
Join Date: Jun 2015
Posts: 5
Default Re: Exercise 4.10

Also I wanted to validate my explanation of other parts of this exercise.

For part (b), this is what I think:
As K increases, the estimation of out-of sample error by validation error gets better. That explains the initial decrease in Expectation[Out-of-Sample Error of g_(m*)]. Then, as K increases beyond the ‘optimal’ value, the training goes bad, which explains the rise.

Please let me know if my understanding is correct or not.

For part (a), I can't figure out the initial decrease in Expectation[Out-of-Sample Error of g^-_(m*)]. Any clue on this will be great.

Thanks,
Sayan
Reply With Quote
  #4  
Old 05-04-2016, 12:26 PM
ntvy95 ntvy95 is offline
Member
 
Join Date: Jan 2016
Posts: 37
Default Re: Exercise 4.10

Well, I'm not sure about my understanding but here is my guess: (If they are not correct please tell me, especially for (c).)

(a) Because g^{-}_{m^{*}} is the hypothesis with smallest E_{in} among M hypotheses, and we have already known that E_{in}(g^{-}_{m^{*}}) is close to E_{out}(g^{-}_{m^{*}}) for small M and large K, hence the initial decrease. As we set out more data for validating, we use less data for training and that leads to worse M hypotheses, hence the afterward increase.

(b) The reason for the initial decrease is already discussed above. A note here is that initially \mathbb{E}[E_{out}(g^{-}_{m^{*}})] is very close to \mathbb{E}[E_{out}(g_{m^{*}})], this is because the size N - K of training set used for outputing g^{-}_{m^{*}} is very close to the size N of training set used for ouputing g_{m^{*}}. Then it takes a rather long ride for \mathbb{E}[E_{out}(g_{m^{*}})] to increase again despite of the worse M hypotheses, because those worse and worse M hypotheses still lead us to the good enough choice of learning model until they get so worse that they finally lead us to the worse choice of learning model.

(c) A possible case is that when K = 1, g^{-}_{m^{*}} and g_{m^{*}} have almost the same size of training set hence almost the same chance to be a good final hypothesis, however g^{-}_{m^{*}} has the guarantee of small \mathbb{E}[E_{out}(g^{-}_{m^{*}})] through small E_{in}(g^{-}_{m^{*}}) while g_{m^{*}} does not have this guarantee. However, as K increase, g^{-}_{m^{*}} is trained using less and less data compared to g_{m^{*}}, hence g^{-}_{m^{*}}'s performance cannot compete with g_{m^{*}}'s anymore.

Thank you.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 11:49 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.