The VC dimension is single number that is a property of the hypothesis set.

But, what is "bias of a hypothesis set"? Bias seems to depend also on dataset size and the learning algorithm, since it depends on

;

depends on the learning algorithm, and the set of datasets over which the expectation is taken depends on dataset size.

Slide 4 says that bias measures "how well

can approximate

". Does this mean "with a sufficiently large dataset and a perfect learning algorithm"?

Is the bias of a (hypothesis set, learning algorithm) combination a single value -- the asymptote of the learning curve? Or is there some notion of bias that is a property of a hypothesis set by itself? If the hypothesis set contains the target function, that does not mean the bias is zero, does it? The beginning of the lecture seems to imply otherwise, but if there is no restriction on the learning algorithm, what guarantees that the average function will in fact be close to the target function for large enough dataset size?

Or is it assumed that the learning algorithm always picks a hypothesis which minimizes

?