I will try to reformulate the construction of Dval in such a way that the independence is patent, and then try to suggest where your confusion may be coming from.
Construction 1 of Dval: Randomly generate D. Randomly partition it into Dtrain (N-K points) and Dval (K points). Learn on Dtrain to obtain

and compute Eval of

using Dval. (This is the standard validation setting in the book.)
Construction 2 of Dval: Randomly generate N-K points to form Dtrain. Learn on Dtrain to obtain

. Now, randomly generate another K points to form Dval. Compute Eval of

using Dval.
It is patently clear in Construction 2 that we are computing Eout of

, essentially by definition of Eout. There is no difference between constructions 1 and 2 in terms of the Dtrain and Dval they produce (statistically). Randomly generating N points and splitting randomly into N-K and K points is statistically equivalent to first randomly generating N-K points and then another random K points. In construction 1 you generate both Dtrain and Dval at the begining, process Dtrain and then test on Dval. In construction 2, you only generate Dval after you processed Dtrain. But Dval still has the same statistical properties in both cases.
Now for where you may be getting subtly confused. It is true that the
value of Eval will
change based on what specific partition was selected, in part because

changes and in part because Dval also changes. This means that

depends on the
partition. This is equivalently saying (in the construction 2 setting) that

depends on the particular Dtrain generated (no surprise there). However,

does not depend on the contents of Dval - if you change the data points in Dval,

will not change. The expectation in (4.8) is an expectation over the
data points in Dval, and

does not depend on that (the partition is fixed and now we are looking at what points are in Dval).
Quote:
Originally Posted by arapmv
Hello,
I have a question regarding the calculation (4.8) on p.139 in the book. The final hypothesis  depends, albeit indirectly, on the choice of the validation set  . Indeed, by construction we train on the complement of  in  , which makes  dependent on the choice of  . The derivation (4.8) appears to rely on the assumption that  is independent of  .
Does anyone have any comments on this?
|