Quote:
Originally Posted by hsolo
Training data has d dimensions in the x's. If one ignored some of the dimensions and did linear regression with reduced number d' of dimensions one would have larger in-sample errors presumably, compared to considering all d dimensions?
Why then is the expected in-sample error averaged over all data sets increasing with the number of dimensions?
|
To answer the first question, if you choose to omit some of the input variables, you will indeed get a larger (at least not smaller) in-sample error. Not sure I understand the second question, but having different training sets does not change the number of input variables. It is a hypothetical situation where you assume the availability of different data sets on the same variables.