I see.

Example:

is the target.

is the input space.

If we let 1.

or

2.

, where t(1) is the t-distribution with one degree of freedom.

I know from my stat classes that in case 1. a linear model is actually "correct".

(this is great since we usually know nothing about f)

So in this case the distribution of X plays a role in selecting H, and hence

reducing the in sample error. (assuming the quadratic loss fct.)

Questions:

So in either case 1. or 2. the interpretation/computation of the sample error is the same?

I am a little confused since the overall true error

(which we hope the sample error approximates) is defined based on the joint

distribution of (X,Y); which depends on the distribution of X.

Thanks. I hope this class/book can clear up some mis-conceptions about the theoretical framework of the learning problem once and for all