Thread: Question 1
View Single Post
Old 07-15-2013, 01:38 PM
yaser's Avatar
yaser yaser is offline
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,478
Default Re: Question 1

Originally Posted by hsolo View Post
Training data has d dimensions in the x's. If one ignored some of the dimensions and did linear regression with reduced number d' of dimensions one would have larger in-sample errors presumably, compared to considering all d dimensions?

Why then is the expected in-sample error averaged over all data sets increasing with the number of dimensions?
To answer the first question, if you choose to omit some of the input variables, you will indeed get a larger (at least not smaller) in-sample error. Not sure I understand the second question, but having different training sets does not change the number of input variables. It is a hypothetical situation where you assume the availability of different data sets on the same variables.
Where everyone thinks alike, no one thinks very much
Reply With Quote