View Single Post
Old 08-21-2012, 12:06 PM
yaser's Avatar
yaser yaser is offline
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Using the whole Data lec13

Originally Posted by Andrs View Post
In the lecture 13 (Validation),
The 10-Fold cross validation mechanism with the training data D is used to select the best "learning model" . My question is if there is any point in running the selected hypothesis (best_Hypotheses in the selected model) using the whole training data set (D) in order to get a better estimate of Eout . Or is the Ecv (cross validation Error) a good enough estimate of Eout.
It is a good idea to restore the full data set and use it for training once the model has been selected, but the problem with using the full data set for estimating E_{\rm out} for any hypothesis in this process is that part of the data set would have already been used for training to come up with this hypothesis, so that part will have a built-in bias. The cross-validation data points, although they are fewer, do not have that bias hence their estimate of E_{\rm out} is more reliable.
Where everyone thinks alike, no one thinks very much
Reply With Quote