LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Chapter 4 - Overfitting (http://book.caltech.edu/bookforum/forumdisplay.php?f=111)
-   -   Exercise 4.10 (http://book.caltech.edu/bookforum/showthread.php?t=4605)

yusunchina 06-16-2015 06:01 AM

Exercise 4.10
Would anyone please give me a clue to part (c)? This seems rather counterintuitive to me.

Thanks a lot!

sayan751 10-14-2015 10:12 AM

Re: Exercise 4.10
I think this can be a possible explanation for Exercise 4.10.(c):

When K is 1, then estimation of out-of sample error by validation error is not ‘that’ good because of the penalty term. Thus, the model chosen from this poor estimate might not be the ‘best’ one. This explains Expectation[Out-of-Sample Error of g^-_(m*)] < Expectation[Out-of-Sample Error of g_(m*)]. This situation somewhat improves as K increases.

Please let me know if this explanation is not correct.

sayan751 10-14-2015 10:18 AM

Re: Exercise 4.10
Also I wanted to validate my explanation of other parts of this exercise.

For part (b), this is what I think:
As K increases, the estimation of out-of sample error by validation error gets better. That explains the initial decrease in Expectation[Out-of-Sample Error of g_(m*)]. Then, as K increases beyond the ‘optimal’ value, the training goes bad, which explains the rise.

Please let me know if my understanding is correct or not.

For part (a), I can't figure out the initial decrease in Expectation[Out-of-Sample Error of g^-_(m*)]. Any clue on this will be great.


ntvy95 05-04-2016 12:26 PM

Re: Exercise 4.10
Well, I'm not sure about my understanding but here is my guess: (If they are not correct please tell me, especially for (c).)

(a) Because g^{-}_{m^{*}} is the hypothesis with smallest E_{in} among M hypotheses, and we have already known that E_{in}(g^{-}_{m^{*}}) is close to E_{out}(g^{-}_{m^{*}}) for small M and large K, hence the initial decrease. As we set out more data for validating, we use less data for training and that leads to worse M hypotheses, hence the afterward increase.

(b) The reason for the initial decrease is already discussed above. A note here is that initially \mathbb{E}[E_{out}(g^{-}_{m^{*}})] is very close to \mathbb{E}[E_{out}(g_{m^{*}})], this is because the size N - K of training set used for outputing g^{-}_{m^{*}} is very close to the size N of training set used for ouputing g_{m^{*}}. Then it takes a rather long ride for \mathbb{E}[E_{out}(g_{m^{*}})] to increase again despite of the worse M hypotheses, because those worse and worse M hypotheses still lead us to the good enough choice of learning model until they get so worse that they finally lead us to the worse choice of learning model.

(c) A possible case is that when K = 1, g^{-}_{m^{*}} and g_{m^{*}} have almost the same size of training set hence almost the same chance to be a good final hypothesis, however g^{-}_{m^{*}} has the guarantee of small \mathbb{E}[E_{out}(g^{-}_{m^{*}})] through small E_{in}(g^{-}_{m^{*}}) while g_{m^{*}} does not have this guarantee. However, as K increase, g^{-}_{m^{*}} is trained using less and less data compared to g_{m^{*}}, hence g^{-}_{m^{*}}'s performance cannot compete with g_{m^{*}}'s anymore.

Thank you.

All times are GMT -7. The time now is 02:42 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.