Is there anything interesting to be said about the relationship between overfitting and the difference between

and the truly best hypothesis in

? I think I've seen some others say that the deterministic noise is not dependent on the size of the data set, but I am wondering whether this difference is what accounts for the overfitting caused by deterministic noise. E.g. for complex enough target function, if our sample data set is very big, and we have a simple model (high bias/deterministic noise), we are not really overfitting, as I understand it.

Edit: I've been assuming that the definitions of bias and variance take expectations over all data sets of a particular fixed size

-- I don't think this was explicitly stated, but I also don't think it makes sense otherwise. In homework #4, I computed a value for

that was very far from the best possible hypothesis (lowest mean squared error), because

was so low in that case.