The point x has nothing to do with the data sets on which you learn. Fix
any point x.
You can now compute M1=Ed[gk(x)].
You can also compute M2=Ed[gk(x)^2].
M1 and M2 are just two numbers which apply to the point x. Clearly M1 and M2 will change if you change x, so M1 and M2 are
functions of x
Now, for example, if you have many x's (eg a test set) you can compute the average of
and
over those x's. This means you have to compute M1 and M2 for each of those x's. You can use the same learning data sets to do so.
Quote:
Originally Posted by mileschen
I still have some questions.
var = Ex[var(x)], but var(x) = Ed[(gk(x)  g_(x))^x], where var(x) is computed based on the K data sets that learnt the average function g_(x). Then, how to compute var, which is a expected value of var(x)?
If var is computed on the same data sets that learnt the average function. Then, how to compute bias = Ex[bias(x)]? If still be computed in the same data set that learnt the average function?
