I'm trying to understand why the large-margin requirement affects the growth function. For any size margin, we can find three points far enough from each other that they are shattered by perceptrons with at least that margin. How, then, is the growth function at n=3 less than 8?

Or, put differently: the growth function is a property of the hypothesis set. The large-margin requirement does not remove any hypotheses from the hypothesis set; it just prevents us from using particular hypotheses for particular training sets. This limitation is a property of the learning algorithm, but the VC analysis was independent of learning algorithm. If the hypothesis set has not changed, how can the growth function change?