
#1




Which kernel to use?
In the course we have applied the gaussian, polynomial and linear kernel on different problems and learned how to tune them wrt. regularization to avoid overfitting.
 For a given problem, it seems like different kernels return different number of support vectors (although with zero training error). Since the generalization ability of the SVM model depends very much on the number of support vectors. Is the actual choice of kernel a "parameter to be tuned" as well?  Is the choice of kernel application specific, data specific?  Any rule of thumb? 
#2




Re: Which kernel to use?
Although only briefly mentioned in the lectures, machine selection of appropriate kernels is one of the approaches that may be taken. The caveat is that considering additional kernels increases the complexity of and thus requires larger data sets to mitigate the risk of overfitting. It is possible that multiple kernels could be applied with the output of each being aggregated to produce the final model.
I suspect that selection of a kernel, without snooping in the data, is more art than science, but may be guided by one's understanding (read intuition) of the expected characteristics of the data. 
#3




Re: Which kernel to use?
Quote:
Quote:

#4




Re: Which kernel to use?
Quote:
The lowest E_cv, should be a good measurement for generalization for the selected kernel. 
#5




Re: Which kernel to use?
Quote:
The "best" kernel can be data specific. On the other hand, there are several properties of popular kernels that can serve as ruleofthumb: * GaussianRBF kernel: suitable firsthand choice for general nonlinear learning (classification)  fewer parameters than polynomial kernel and numerically more stable, with a wide range of fitting power (but requires a careful tuning). S. S. Keerthi and C.J. Lin. Asymptotic behaviors of support vector machines with Gaussian kernel . Neural Computation, 15(2003), 16671689. C.W. Hsu, C.C. Chang, C.J. Lin. A practical guide to support vector classification . Technical report, Department of Computer Science, National Taiwan University. July, 2003. * Perceptron kernel: similar to Gaussian in performance but with fewer parameters (only needs to be tuned). HsuanTien Lin and Ling Li. Support Vector Machinery for Infinite Ensemble Learning. Journal of Machine Learning Research, 9(2), 285312, 2008. * Linear "kernel": suitable when # feature >> # example, which may suggest that going nonlinear is not needed. In that case, there are ultra fast solvers (such as LIBLINEAR) than general dual solvers (LIBSVM). G.X. Yuan, C.H. Ho, and C.J. Lin. Recent Advances of Largescale Linear Classification. To appear in Proceedings of IEEE, 2012. R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, and C.J. Lin. LIBLINEAR: A library for large linear classification . Journal of Machine Learning Research 9(2008), 18711874. An extension is loworder polynomial "kernel" that use the fast solvers, which can be competitive to Gaussian RBF but much faster (in training and testing) Y.W. Chang, C.J. Hsieh, K.W. Chang, M. Ringgaard, and C.J. Lin. Training and Testing Lowdegree Polynomial Data Mappings via Linear SVM. Journal of Machine Learning Research, 11(2010), 14711490. Hope this helps.
__________________
When one teaches, two learn. 
#6




Re: Which kernel to use?
Thanks you all for useful input. We have some reading to do... :)

Thread Tools  
Display Modes  

