Quote:
Originally Posted by Anne Paulson
In the lectures, it's mentioned that often with an SVM you get the happy news that you have just a few support vectors, and you therefore know that your VC dimension is small and you can expect good generalization.
But how many vectors do you need before you're unhappy? Let's suppose you have a dataset with 7000 observations and 550 variables, an easy supposition for me because I do. Suppose you run an SVM and you discover that with a linear kernel, you have some 700 support vectors, and with a radial kernel, you have some 2000.
That seems like a lot; nearly half the points are support vectors. But if you attack the problem with another machine learning method like neural nets or multinomial regression, you will also have one or two thousand parameters, or maybe more, so you will also have a big VC dimension.

For an absolute estimate, the theoretical result mentioned at the end of Lecture 14 (last slide) shows the generalization error to be
bounded by the ratio between the number of support vectors and the number of examples. For a comparative estimate, the same result implies that the number of support vectors behaves more or less like a VC dimension of other models.
Of course there are situations where neither SVM nor other models will perform to a satisfactory level, as we would expect if the resources of data are not adequate to capture the complexity of the target.
Quote:
So maybe you should be happy with the 2000 support vectors if you look like you're getting good generalization in crossvalidation? Or you should be happy regardless of the number of support vectors, if the crossvalidation shows good generalization? Or VC dimension doesn't matter at all if the crossvalidation news is good?

Indeed, the cross validation estimate can overrule (our interpretation of) the theoretical VC bound. The VC is just a
bound that applies to all cases, while cross validation provides a concrete estimate for a particular case.