Quote:
Originally Posted by rakhlin
When I generate new data and hypothesis for every single run of 1000 (as the problem suggests) I get stable outofsample result close to (slightly greater than) insample error.
When I estimate 1000 different outofsamples for one insample and single hypothesis I get very different average error rates with high variability from 0.01 to 0.13 Why so?

Just to clarify. You used the insample points to train and arrived at a final set of weights (corresponding to the final hypothesis). Each out ofsample point is now tested on this hypothesis and compared to the target value on the same point. Now, what exactly do you do to get the two scenarios you are describing?