View Single Post
  #8  
Old 07-31-2013, 01:23 AM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: How many support vectors is too many?

Quote:
Originally Posted by hsolo View Post
1. Just a reiteration for clarity -- for the bound at end of Lec 14 as applied to soft SVMs, by number of SVs we mean the number of margin SVs right? That would be consistent with the thought process that the non-margin SVs end up going to the constraints (0 or C) and so arent getting full 'freedom of expression' and therefore arent 'independent params'
For the CV bound, the number of SVs include the "free SVs" (with alpha between 0 and C and are on the margin), and the "bounded SVs" (with alpha = C and violating the margin).

Quote:
Originally Posted by hsolo View Post
2. I have been trying the problems 2,3 etc. using cvxopt : I used a simplistic rule that if an alpha is very close (say within a range of a0 = 10^-5) to 0 or C I respectively round it to 0 or C; and the remaining alphas become margin SVs. If the b's corresponding to these are consistent (meaning, within a range of b0 = 10^-3 of each other) then I conclude that there is nothing fundamentally unsound in what I am doing. I came to a0 and b0 with some trial and error. Is this a sound approach?
It sounds reasonable.

Quote:
Originally Posted by hsolo View Post
3. I would imagine the ranges a0 and b0 can be derived in some principled way -- for instance I didnt really account for relation between a0 and b0 and its relationship to cvxopt's numeric error tolerance -- such a principled choice of support vectors would be part of what packages like libsvm provide -- is that correct?
Numerical optimization is a difficult problem so it is difficult to define the principled way. In specific packages like LIBSVM, some careful implementations is used to carefully and stably mark at-bound alphas. In general packages for convex programming, this may not be the case.

Quote:
Originally Posted by hsolo View Post
4. I found that in problem 2 as well as 3, some of the classifiers have single digit number of margin SVs and some ran into multiple thousands -- I am somewhat uncomfortable about this huge variation but a visual perusal of the thousands of distinct evaluations of b indicated they are all close to each other; and moreover the same code generates numbers that are essentially consistent with the margin support vectors and Ein discussed in the classification problem in the thread http://book.caltech.edu/bookforum/showthread.php?t=4044
So I am hoping I am right in assuming that the distinct values of b's being close to each other are a good indicator of soundness. Any comments?
If the convex programming package is working reasonably well, the b should be close to each other for the margin SVs. So that can act as a safety check.

Quote:
Originally Posted by hsolo View Post
5. Finally in the context of the discussion in Lecture 17 on Data Snooping -- We would have to use upto 100 SVMs (one versus one) or 10 SVMs (one versus rest) for our 10-way classification problem. The number of margin SVs ranges from single digit to thousands. In this context, how to view the theoretical generalization bounds? They would seem to fail the thumb rule of ratio of 10. But if the Eout on actual test data is low, we can still go ahead with this ?
For the multiclass case, there are theoretical results that analyze the performance of one-versus-one and one-versus-rest. The results are not as simple as counting the number of internal models and it is often difficult to theoretically decide which one to be the better choice. Practically people often favor one-versus-one SVMs for computational efficiency during training, because each SVM is trained with significantly less data than the whole set.

Hope this helps.
__________________
When one teaches, two learn.
Reply With Quote