 rainbow 08-19-2012

Instead of using the SVM for pure classification, is it possible to return probabilities in the form or by any other transform?

 htlin 08-19-2012

Yes, the usual one used for SVMs is proposed by Platt:

http://citeseerx.ist.psu.edu/viewdoc...10.1.1.41.1639

which is of the form and estimates and by a logistic-regression like optimization problem. An improved implementation for calculating and can be found in

Hsuan-Tien Lin, Chih-Jen Lin, and Ruby C. Weng. A Note on Platt's Probabilistic Outputs for Support Vector Machines. Machine Learning, 68(3), 267-276, 2007.

http://www.csie.ntu.edu.tw/~htlin/pa.../plattprob.pdf

Hope this helps.

 rainbow 08-20-2012

Thanks!

 patrickjtierney 08-22-2012

Yes. Thank you. Very interesting. I read both papers (well, skimmed some parts) and basically followed but I do have a general question.

I can understand A as a saturation factor or gain, but at first glance B is a little confusing. If B is non-zero, then the probability at the decision boundary will not be 1/2.

Is the reason for needing non-zero B that the mapping from Y->T no longer just maps +1 to 1, and -1 to 0, but rather to two values in (0,1) based on the relative number of +1s to -1s?

 samirbajaj 08-22-2012

And just out of curiosity - as an extension to the original question:

Can SVMs be used for regression? If so, do they perform better than the regression methods we have learned about in the course?

Thanks.

-Samir

 htlin 08-23-2012

 patrickjtierney: Yes. Thank you. Very interesting. I read both papers (well, skimmed some parts) and basically followed but I do have a general question. I can understand A as a saturation factor or gain, but at first glance B is a little confusing. If B is non-zero, then the probability at the decision boundary will not be 1/2. Is the reason for needing non-zero B that the mapping from Y->T no longer just maps +1 to 1, and -1 to 0, but rather to two values in (0,1) based on the relative number of +1s to -1s?
You are very right. My personal interpretation is that provides an opportunity to calibrate the boundary of SVM for probability estimates. Recall that SVM roots from large-margin and hence the hyperplane is "right in the middle of the two classes." While arguably, for probability estimates, a good hyperplane (of ) shall be somewhat away from the majority class. So there may be a need to "shift" the hyperplane by .

Hope this helps.

 htlin 08-23-2012

 samirbajaj: And just out of curiosity - as an extension to the original question: Can SVMs be used for regression? If so, do they perform better than the regression methods we have learned about in the course? Thanks. -Samir
Yes, there are several extensions of SVM for regression. One of which was proposed by the original SVM author, commonly named -support vector regression. -SVR can be found in common SVM packages such as LIBSVM and shares many interesting properties with the classification one. The other is extended from linear regression, commonly named least-square SVM.

Hope this helps.

