Quote:
Originally Posted by yaser
For an absolute estimate, the theoretical result mentioned at the end of Lecture 14 (last slide) shows the generalization error to be bounded by the ratio between the number of support vectors and the number of examples. For a comparative estimate, the same result implies that the number of support vectors behaves more or less like a VC dimension of other models.
Of course there are situations where neither SVM nor other models will perform to a satisfactory level, as we would expect if the resources of data are not adequate to capture the complexity of the target.
Indeed, the cross validation estimate can overrule (our interpretation of) the theoretical VC bound. The VC is just a bound that applies to all cases, while cross validation provides a concrete estimate for a particular case.

I have noticed that the support vector ratio sometimes provides a very good bound, at least with our toy problems. Suppose we had a situation like:
,
, support vectors=20
,
, support vectors=60
where the size of the data set is around 400. I would be tempted to choose
, regardless of
. What are your thoughts?