View Single Post
Old 08-21-2012, 12:52 PM
yaser's Avatar
yaser yaser is offline
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Using the whole Data lec13

Originally Posted by Andrs View Post
Thanks for the quick answer.
I would like to check that I really understood your recomendation: I will be consuming all my trainning-data with the cross validation procedure. Through the CV I select the model and the hypothesis (g-) with the corresponding parameters and I get Ecv that is a good estimate of Eout.
Your suggestion is that I could use this model (hypothesis set) and (re)train it on the full trainning-data in order to select a new hypothesis(g+). This new hypothesis(g+) may do better than the hypothesis (g-) but the only safer estimate for Eout is the estimate that I got thru the cross validation(Ecv). The only "problem" here is that now I do not have any data to "test" this new hypothesis (g+).
The hypothesis trained on the full data set, denoted by g which you refer to as g+, is indeed the result of this process. To estimate its E_{\rm out}, we still use the cross validation estimate for g^-, notwithstanding the fact that it is a different hypothesis (but close enough) for the reason you outline; we have no cross validation data points left to evaluate g directly.
Where everyone thinks alike, no one thinks very much
Reply With Quote