Quote:
Originally Posted by samirbajaj
If we are sampling uniformly from the interval [1, 1] for the calculation of g_bar, as well as for each data set (g_d), why would the variance be anything but a very small quantity? In the general case, when the data sets are not drawn from a uniform distribution, a nonzero variance makes sense, but if there is sufficient overlap in the data sets, it makes intuitive sense that the variance should be close to zero.

is the average of
over different data sets
. You will get different
's when you pick different
's, since the final hypothesis depends on the data set used for training. Therefore, there will be a variance that measures how different these
's are around their expected value
(which does not depend on
as
gets integrated out in the calculation of
).
This argument holds for any probability distribution, uniform or not, that is used to generate the different
's.