Quote:
Originally Posted by hsolo
Training data has d dimensions in the x's. If one ignored some of the dimensions and did linear regression with reduced number d' of dimensions one would have larger insample errors presumably, compared to considering all d dimensions?
Why then is the expected insample error averaged over all data sets increasing with the number of dimensions?

To answer the first question, if you choose to omit some of the input variables, you will indeed get a larger (at least not smaller) insample error. Not sure I understand the second question, but having different training sets does not change the number of input variables. It is a hypothetical situation where you assume the availability of different data sets on the same variables.