Quote:
Originally Posted by hsolo
Is the 'heuristic' number of parameters (the VC dimension proxy) to be used while reasoning about generalization then the number of margin support vectors << the number of all support vectors?
When we use kernel functions with soft SVMs (problem 2 etc), where there is no explicit w, does the above translate to :
* 1==> Use all support vectors to compute the sigma term in the hypothesis function g()
* 2==> Use only margin support vectors for b (which is also used in g()
I was wondering if this aspect was covered in the lecture or any of the additional material  I seem to have missed.

The computation of
involves all support vectors, margin and otherwise, since it involves all
's that are bigger than zero. Assuming
has been computed, the computation of
, for both hard and soft margins, involves any one support vector (margin support vector in the case of soft margin) since it is based on solving the equation
for
.
In the case of kernels, the explicit evaluation of
followed by taking an inner product with a point
is replaced by evaluating the kernel with two arguments; one is a support vector (margin or otherwise) and the other is the point
, and repeating that for all support vectors (margin or otherwise).