The degrees of freedom I was writing about are not the 3N ones of the N events. I was mentally adding support vectors until the parallel planes defining the margin are fixed by them. Perhaps I shouldn't have mentioned degrees of freedom at all, and just asked you to visualize how many support vectors there would have to be in order to freeze the planes defining the margin. But when I wrote about there being two degrees of freedom I meant the degrees of freedom of the orientation of the pair of parallel planes once they have been partly locked in position by one support vector on each. There are two coordinates needed to define the direction of the common normal of the pair of planes. To fix the orientation, continue adding support vectors. If a third support vector is added, there must be two on one plane. That plane will still not be fixed in orientation; it can rotate about an axis joining the two support vectors, and the parallel plane can be reoriented to keep it parallel, without separating any support vector from its plane. So only one degree of freedom of the normal is eliminated by having a second support vector on one plane. One more support vector is needed on either of the planes to completely fix the orientation of the two planes. So four support vectors is the maximum (with non-zero probability). There can be fewer than four support vectors, in which case the orientation of the two planes must have adjusted itself to maximize the margin given whatever number of support vectors there is.

The problem can be generalized to d dimensions and N events, and the answer is that the number of support vectors can be anything from 2 to min(d+1,N), depending on where the events are and how they're classified. Here's why:

Call "

" the unit normal to the planes and call "

" and "

" two support vectors, one on one plane and one on the other. Then the remaining support vectors must be different from both

and

, and they must satisfy either

or

. To completely specify the d dimensional

we need d equations, one of which is that

is a unit vector. That leaves d-1 additional support vectors required, for a total of d+1. There can be more support vectors if they accidentally happen to land on one of the planes, but that has zero probability. There must be fewer if the number of points, N, is less than d+1. There can be fewer if the plane orientations are not completely fixed by the support vectors, but have adjusted themselves as best they can to maximize the margin.

The problem could have been made harder by asking about the more general case of d dimensions, but then it would have been more of a mathematical exercise than of thinking about what support vectors really are. And I hadn't thought of the above explanation at the time I wrote the problem.