Quote:
Originally Posted by rainbow
In the course we have applied the gaussian, polynomial and linear kernel on different problems and learned how to tune them wrt. regularization to avoid overfitting.
 For a given problem, it seems like different kernels return different number of support vectors (although with zero training error). Since the generalization ability of the SVM model depends very much on the number of support vectors. Is the actual choice of kernel a "parameter to be tuned" as well?
 Is the choice of kernel application specific, data specific?
 Any rule of thumb?

Yes, choosing the kernel is like choosing an algorithm/model/hypotheses set and is important for SVMs. The validation techniques discussed in the class can be helpful for making the choice.
The "best" kernel can be data specific. On the other hand, there are several properties of popular kernels that can serve as ruleofthumb:
* GaussianRBF kernel: suitable firsthand choice for general nonlinear learning (classification)  fewer parameters than polynomial kernel and numerically more stable, with a wide range of fitting power (but requires a careful tuning).
S. S. Keerthi and C.J. Lin. Asymptotic behaviors of support vector machines with Gaussian kernel . Neural Computation, 15(2003), 16671689.
C.W. Hsu, C.C. Chang, C.J. Lin. A practical guide to support vector classification . Technical report, Department of Computer Science, National Taiwan University. July, 2003.
* Perceptron kernel: similar to Gaussian in performance but with fewer parameters (only
needs to be tuned).
HsuanTien Lin and Ling Li. Support Vector Machinery for Infinite Ensemble Learning. Journal of Machine Learning Research, 9(2), 285312, 2008.
* Linear "kernel": suitable when # feature >> # example, which may suggest that going nonlinear is not needed. In that case, there are ultra fast solvers (such as LIBLINEAR) than general dual solvers (LIBSVM).
G.X. Yuan, C.H. Ho, and C.J. Lin. Recent Advances of Largescale Linear Classification. To appear in Proceedings of IEEE, 2012.
R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, and C.J. Lin. LIBLINEAR: A library for large linear classification . Journal of Machine Learning Research 9(2008), 18711874.
An extension is loworder polynomial "kernel" that use the fast solvers, which can be competitive to Gaussian RBF but much faster (in training and testing)
Y.W. Chang, C.J. Hsieh, K.W. Chang, M. Ringgaard, and C.J. Lin. Training and Testing Lowdegree Polynomial Data Mappings via Linear SVM. Journal of Machine Learning Research, 11(2010), 14711490.
Hope this helps.