
because

is defined as an expectation with respect to data sets of g(x). The average over data sets approximates this expectation.
Yes,

is not a valid hypothesis: it may not be in your hypothesis set; it may not even be binary. It is never used as a classifier. It is just used to represent "what would happen on average after learning", and this abstract function plays a role in defining the bias in the bias variance decomposition.
Quote:
Originally Posted by Newbrict
I think  because it's computed over a finite set of points, whereas the actual value for  is an exact solution
|