Thread: HW 2 Problem 6
View Single Post
Old 07-23-2012, 07:59 PM
ilya239 ilya239 is offline
Senior Member
Join Date: Jul 2012
Posts: 58
Default Re: HW 2 Problem 6

Originally Posted by rakhlin View Post
When I generate new data and hypothesis for every single run of 1000 (as the problem suggests) I get stable out-of-sample result close to (slightly greater than) in-sample error.
When I estimate 1000 different out-of-samples for one in-sample and single hypothesis I get very different average error rates with high variability from 0.01 to 0.13 Why so?
That's like problem 1: if you try many enough different out-of-samples, inevitably there will be one on which your hypothesis does great, and one on which it does badly. As an extreme case, if your out-of-sample testing size was 1 instead of 1000, on some of these out-of-samples you'd get 0% error rate and on some you'd get 100% error rate. To get an actual estimate of out-of-sample error rate you should pool all these together.
Reply With Quote