Quote:
Originally Posted by yaser
1. All of them for computing , since any vector with will contribute to the derived solution for .
2. Only margin SV's for computing , since we need an equation, not an inequality, to solve for after knowing .

Is the 'heuristic' number of parameters (the VC dimension proxy) to be used while reasoning about generalization then the number of margin support vectors << the number of all support vectors?
When we use kernel functions with soft SVMs (problem 2 etc), where there is no explicit w, does the above translate to :
* 1==> Use all support vectors to compute the sigma term in the hypothesis function g()
* 2==> Use only margin support vectors for b (which is also used in g()
I was wondering if this aspect was covered in the lecture or any of the additional material  I seem to have missed.