LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 7 (http://book.caltech.edu/bookforum/forumdisplay.php?f=136)
-   -   Discussion of Lecture 13 "Validation" (http://book.caltech.edu/bookforum/showthread.php?t=4013)

 vikasatkin 02-20-2013 04:18 PM

Discussion of Lecture 13 "Validation"

Links: [Lecture 13 slides] [all slides] [Lecture 13 video]

Question: (Slide 7/22) 1. Why do we report instead of ? 2. Why do we report instead of ?

Answer: 1. Because from theoretical analysis we know, that the more points we have in the dataset the better the learning outcome is. So it is better to use N points for training, than N-K, although we can't measure, how much better it is.
2. Because we can't report : we trained on all N points, so we don't have any other points to compute (we can't use some of these N points, because they are already contaminated).

 vikasatkin 02-20-2013 04:23 PM

Discussion of Lecture 13 "Validation"

Question: (Slide 7/22) The rule of thumb is N/5. Why do we have N/10 on the last slide?

Answer: On the last slide we use cross-validation, on slide 7/22 we do validation just once, so it is a different game. It is clear, that in cross-validation we should use less points for validation, because in any case we repeat the process and finally end up using all the points, so we better increase . The question is, how much smaller K should we take. We might take K=1 (as in leave one out). Of course, if the dataset is large, it would take too much time. Actually, this is not the only reason. It looks counterintuitive, but in some situations, leave one out cross-validation error has stronger fluctuations, so you would use 10-fold cross-validation even if you don't worry about the computation resources, e.g. if you have just 100 examples, 10-fold validation may give you better stability then leave one out.

 vikasatkin 02-20-2013 04:26 PM

Re: Discussion of Lecture 13 "Validation"

Question: (Slide 11/22) If you already have all the hypothesis, why do you do validation and choose a model instead of doing aggregation?

Answer: In practice people often use aggregation and often it does perform better.

If you have 100 points, you can train on all those points or train on 99 points, leaving n-th point out and then take the average over n. And sometimes you get better results, despite the fact, that after all you still use the same 100 points. The reason is that this process may reduce variance and, thus, is less affected by the noise.

 All times are GMT -7. The time now is 08:59 AM.