View Single Post
  #5  
Old 09-16-2012, 05:30 PM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: Which kernel to use?

Quote:
Originally Posted by rainbow View Post
In the course we have applied the gaussian, polynomial and linear kernel on different problems and learned how to tune them wrt. regularization to avoid overfitting.

- For a given problem, it seems like different kernels return different number of support vectors (although with zero training error). Since the generalization ability of the SVM model depends very much on the number of support vectors. Is the actual choice of kernel a "parameter to be tuned" as well?

- Is the choice of kernel application specific, data specific?

- Any rule of thumb?
Yes, choosing the kernel is like choosing an algorithm/model/hypotheses set and is important for SVMs. The validation techniques discussed in the class can be helpful for making the choice.

The "best" kernel can be data specific. On the other hand, there are several properties of popular kernels that can serve as rule-of-thumb:

* Gaussian-RBF kernel: suitable first-hand choice for general nonlinear learning (classification) --- fewer parameters than polynomial kernel and numerically more stable, with a wide range of fitting power (but requires a careful tuning).

S. S. Keerthi and C.-J. Lin. Asymptotic behaviors of support vector machines with Gaussian kernel . Neural Computation, 15(2003), 1667-1689.

C.-W. Hsu, C.-C. Chang, C.-J. Lin. A practical guide to support vector classification . Technical report, Department of Computer Science, National Taiwan University. July, 2003.

* Perceptron kernel: similar to Gaussian in performance but with fewer parameters (only C needs to be tuned).

Hsuan-Tien Lin and Ling Li. Support Vector Machinery for Infinite Ensemble Learning. Journal of Machine Learning Research, 9(2), 285-312, 2008.

* Linear "kernel": suitable when # feature >> # example, which may suggest that going non-linear is not needed. In that case, there are ultra fast solvers (such as LIBLINEAR) than general dual solvers (LIBSVM).

G.-X. Yuan, C.-H. Ho, and C.-J. Lin. Recent Advances of Large-scale Linear Classification. To appear in Proceedings of IEEE, 2012.

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification . Journal of Machine Learning Research 9(2008), 1871-1874.

An extension is low-order polynomial "kernel" that use the fast solvers, which can be competitive to Gaussian RBF but much faster (in training and testing)

Y.-W. Chang, C.-J. Hsieh, K.-W. Chang, M. Ringgaard, and C.-J. Lin. Training and Testing Low-degree Polynomial Data Mappings via Linear SVM. Journal of Machine Learning Research, 11(2010), 1471-1490.

Hope this helps.
__________________
When one teaches, two learn.
Reply With Quote