Quote:
Originally Posted by yaser
For an absolute estimate, the theoretical result mentioned at the end of Lecture 14 (last slide) shows the generalization error to be bounded by the ratio between the number of support vectors and the number of examples. For a comparative estimate, the same result implies that the number of support vectors behaves more or less like a VC dimension of other models.
Of course there are situations where neither SVM nor other models will perform to a satisfactory level, as we would expect if the resources of data are not adequate to capture the complexity of the target.
Indeed, the cross validation estimate can overrule (our interpretation of) the theoretical VC bound. The VC is just a bound that applies to all cases, while cross validation provides a concrete estimate for a particular case.

Prof Yaser,
Thanks for your clarifications  I have some questions in the context of this thread and problems 2,3 etc of the homework, and also the generalization bounds.
1. Just a reiteration for clarity  for the bound at end of Lec 14 as applied to soft SVMs, by number of SVs we mean the number of margin SVs right? That would be consistent with the thought process that the nonmargin SVs end up going to the constraints (0 or C) and so arent getting full 'freedom of expression' and therefore arent 'independent params'
2. I have been trying the problems 2,3 etc. using cvxopt : I used a simplistic rule that if an alpha is very close (say within a range of a0 = 10^5) to 0 or C I respectively round it to 0 or C; and the remaining alphas become margin SVs. If the b's corresponding to these are consistent (meaning, within a range of b0 = 10^3 of each other) then I conclude that there is nothing fundamentally unsound in what I am doing. I came to a0 and b0 with some trial and error. Is this a sound approach?
3. I would imagine the ranges a0 and b0 can be derived in some principled way  for instance I didnt really account for relation between a0 and b0 and its relationship to cvxopt's numeric error tolerance  such a principled choice of support vectors would be part of what packages like libsvm provide  is that correct?
4. I found that in problem 2 as well as 3, some of the classifiers have single digit number of margin SVs and some ran into multiple thousands  I am somewhat uncomfortable about this huge variation but a visual perusal of the thousands of distinct evaluations of b indicated they are all close to each other; and moreover the same code generates numbers that are essentially consistent with the margin support vectors and Ein discussed in the classification problem in the thread
http://book.caltech.edu/bookforum/showthread.php?t=4044
So I am hoping I am right in assuming that the distinct values of b's being close to each other are a good indicator of soundness. Any comments?
5. Finally in the context of the discussion in Lecture 17 on Data Snooping  We would have to use upto 100 SVMs (one versus one) or 10 SVMs (one versus rest) for our 10way classification problem. The number of margin SVs ranges from single digit to thousands. In this context, how to view the theoretical generalization bounds? They would seem to fail the thumb rule of ratio of 10. But if the Eout on actual test data is low, we can still go ahead with this ?