Quote:
Originally Posted by samirbajaj
I don't understand the Pr( f(x) != g(x) ) expression -- what exactly does this mean? Once the algorithm has converged, presumable f(x) matches g(x) on all data, so the difference is zero
|
On all data, yes. However, the probability is with respect to

over the entire input space, not restricted to

being in the finite data set used for training.