LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 7

Reply
 
Thread Tools Display Modes
  #1  
Old 02-20-2013, 05:18 PM
vikasatkin vikasatkin is offline
Caltech
 
Join Date: Sep 2011
Posts: 39
Default Discussion of Lecture 13 "Validation"

Links: [Lecture 13 slides] [all slides] [Lecture 13 video]

Question: (Slide 7/22) 1. Why do we report g instead of g^{-}? 2. Why do we report E_{val}(g^{-}) instead of E_{val}(g)?

Answer: 1. Because from theoretical analysis we know, that the more points we have in the dataset the better the learning outcome is. So it is better to use N points for training, than N-K, although we can't measure, how much better it is.
2. Because we can't report E_{val}(g): we trained on all N points, so we don't have any other points to compute E_{val}(g) (we can't use some of these N points, because they are already contaminated).
Reply With Quote
  #2  
Old 02-20-2013, 05:23 PM
vikasatkin vikasatkin is offline
Caltech
 
Join Date: Sep 2011
Posts: 39
Default Discussion of Lecture 13 "Validation"

Question: (Slide 7/22) The rule of thumb is N/5. Why do we have N/10 on the last slide?

Answer: On the last slide we use cross-validation, on slide 7/22 we do validation just once, so it is a different game. It is clear, that in cross-validation we should use less points for validation, because in any case we repeat the process and finally end up using all the points, so we better increase N-K. The question is, how much smaller K should we take. We might take K=1 (as in leave one out). Of course, if the dataset is large, it would take too much time. Actually, this is not the only reason. It looks counterintuitive, but in some situations, leave one out cross-validation error has stronger fluctuations, so you would use 10-fold cross-validation even if you don't worry about the computation resources, e.g. if you have just 100 examples, 10-fold validation may give you better stability then leave one out.
Reply With Quote
  #3  
Old 02-20-2013, 05:26 PM
vikasatkin vikasatkin is offline
Caltech
 
Join Date: Sep 2011
Posts: 39
Default Re: Discussion of Lecture 13 "Validation"

Question: (Slide 11/22) If you already have all the hypothesis, why do you do validation and choose a model instead of doing aggregation?

Answer: In practice people often use aggregation and often it does perform better.

If you have 100 points, you can train on all those points or train on 99 points, leaving n-th point out and then take the average over n. And sometimes you get better results, despite the fact, that after all you still use the same 100 points. The reason is that this process may reduce variance and, thus, is less affected by the noise.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 04:48 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.