![]() |
#1
|
|||
|
|||
![]()
In the course we have applied the gaussian, polynomial and linear kernel on different problems and learned how to tune them wrt. regularization to avoid overfitting.
- For a given problem, it seems like different kernels return different number of support vectors (although with zero training error). Since the generalization ability of the SVM model depends very much on the number of support vectors. Is the actual choice of kernel a "parameter to be tuned" as well? - Is the choice of kernel application specific, data specific? - Any rule of thumb? |
#2
|
|||
|
|||
![]()
Although only briefly mentioned in the lectures, machine selection of appropriate kernels is one of the approaches that may be taken. The caveat is that considering additional kernels increases the complexity of
![]() I suspect that selection of a kernel, without snooping in the data, is more art than science, but may be guided by one's understanding (read intuition) of the expected characteristics of the data. |
#3
|
|||
|
|||
![]() Quote:
Quote:
|
#4
|
|||
|
|||
![]() Quote:
The lowest E_cv, should be a good measurement for generalization for the selected kernel. |
#5
|
||||
|
||||
![]() Quote:
The "best" kernel can be data specific. On the other hand, there are several properties of popular kernels that can serve as rule-of-thumb: * Gaussian-RBF kernel: suitable first-hand choice for general nonlinear learning (classification) --- fewer parameters than polynomial kernel and numerically more stable, with a wide range of fitting power (but requires a careful tuning). S. S. Keerthi and C.-J. Lin. Asymptotic behaviors of support vector machines with Gaussian kernel . Neural Computation, 15(2003), 1667-1689. C.-W. Hsu, C.-C. Chang, C.-J. Lin. A practical guide to support vector classification . Technical report, Department of Computer Science, National Taiwan University. July, 2003. * Perceptron kernel: similar to Gaussian in performance but with fewer parameters (only ![]() Hsuan-Tien Lin and Ling Li. Support Vector Machinery for Infinite Ensemble Learning. Journal of Machine Learning Research, 9(2), 285-312, 2008. * Linear "kernel": suitable when # feature >> # example, which may suggest that going non-linear is not needed. In that case, there are ultra fast solvers (such as LIBLINEAR) than general dual solvers (LIBSVM). G.-X. Yuan, C.-H. Ho, and C.-J. Lin. Recent Advances of Large-scale Linear Classification. To appear in Proceedings of IEEE, 2012. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification . Journal of Machine Learning Research 9(2008), 1871-1874. An extension is low-order polynomial "kernel" that use the fast solvers, which can be competitive to Gaussian RBF but much faster (in training and testing) Y.-W. Chang, C.-J. Hsieh, K.-W. Chang, M. Ringgaard, and C.-J. Lin. Training and Testing Low-degree Polynomial Data Mappings via Linear SVM. Journal of Machine Learning Research, 11(2010), 1471-1490. Hope this helps.
__________________
When one teaches, two learn. |
#6
|
|||
|
|||
![]()
Thanks you all for useful input. We have some reading to do... :-)
|
![]() |
Thread Tools | |
Display Modes | |
|
|