LFD Book Forum Discussion of Lecture 13 "Validation"

#1
02-20-2013, 04:18 PM
 vikasatkin Caltech Join Date: Sep 2011 Posts: 39
Discussion of Lecture 13 "Validation"

Links: [Lecture 13 slides] [all slides] [Lecture 13 video]

Question: (Slide 7/22) 1. Why do we report instead of ? 2. Why do we report instead of ?

Answer: 1. Because from theoretical analysis we know, that the more points we have in the dataset the better the learning outcome is. So it is better to use N points for training, than N-K, although we can't measure, how much better it is.
2. Because we can't report : we trained on all N points, so we don't have any other points to compute (we can't use some of these N points, because they are already contaminated).
#2
02-20-2013, 04:23 PM
 vikasatkin Caltech Join Date: Sep 2011 Posts: 39
Discussion of Lecture 13 "Validation"

Question: (Slide 7/22) The rule of thumb is N/5. Why do we have N/10 on the last slide?

Answer: On the last slide we use cross-validation, on slide 7/22 we do validation just once, so it is a different game. It is clear, that in cross-validation we should use less points for validation, because in any case we repeat the process and finally end up using all the points, so we better increase . The question is, how much smaller K should we take. We might take K=1 (as in leave one out). Of course, if the dataset is large, it would take too much time. Actually, this is not the only reason. It looks counterintuitive, but in some situations, leave one out cross-validation error has stronger fluctuations, so you would use 10-fold cross-validation even if you don't worry about the computation resources, e.g. if you have just 100 examples, 10-fold validation may give you better stability then leave one out.
#3
02-20-2013, 04:26 PM
 vikasatkin Caltech Join Date: Sep 2011 Posts: 39
Re: Discussion of Lecture 13 "Validation"

Question: (Slide 11/22) If you already have all the hypothesis, why do you do validation and choose a model instead of doing aggregation?

Answer: In practice people often use aggregation and often it does perform better.

If you have 100 points, you can train on all those points or train on 99 points, leaving n-th point out and then take the average over n. And sometimes you get better results, despite the fact, that after all you still use the same 100 points. The reason is that this process may reduce variance and, thus, is less affected by the noise.

 Thread Tools Display Modes Hybrid Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 05:25 AM.