LFD Book Forum  

Go Back   LFD Book Forum > General > General Discussion of Machine Learning

Reply
 
Thread Tools Display Modes
  #1  
Old 09-16-2012, 09:45 AM
rainbow rainbow is offline
Member
 
Join Date: Jul 2012
Posts: 41
Default Which kernel to use?

In the course we have applied the gaussian, polynomial and linear kernel on different problems and learned how to tune them wrt. regularization to avoid overfitting.

- For a given problem, it seems like different kernels return different number of support vectors (although with zero training error). Since the generalization ability of the SVM model depends very much on the number of support vectors. Is the actual choice of kernel a "parameter to be tuned" as well?

- Is the choice of kernel application specific, data specific?

- Any rule of thumb?
Reply With Quote
  #2  
Old 09-16-2012, 11:27 AM
JohnH JohnH is offline
Member
 
Join Date: Jul 2012
Posts: 43
Default Re: Which kernel to use?

Although only briefly mentioned in the lectures, machine selection of appropriate kernels is one of the approaches that may be taken. The caveat is that considering additional kernels increases the complexity of \mathcal H and thus requires larger data sets to mitigate the risk of overfitting. It is possible that multiple kernels could be applied with the output of each being aggregated to produce the final model.

I suspect that selection of a kernel, without snooping in the data, is more art than science, but may be guided by one's understanding (read intuition) of the expected characteristics of the data.
Reply With Quote
  #3  
Old 09-16-2012, 12:07 PM
rainbow rainbow is offline
Member
 
Join Date: Jul 2012
Posts: 41
Default Re: Which kernel to use?

Quote:
Originally Posted by JohnH View Post
The caveat is that considering additional kernels increases the complexity of \mathcal H and thus requires larger data sets to mitigate the risk of overfitting.
Good point!

Quote:
I suspect that selection of a kernel, without snooping in the data, is more art than science, but may be guided by one's understanding (read intuition) of the expected characteristics of the data.
So, one strategy could be to think in terms of a suitable nonlinear transformation (that would match the data) and then find a kernel matching that transformation. One of the great benefits with SVM is that that you never visit the feature space, you just exploit it via the kernel space (kernel trick).
Reply With Quote
  #4  
Old 09-16-2012, 12:18 PM
Andrs Andrs is offline
Member
 
Join Date: Jul 2012
Posts: 47
Default Re: Which kernel to use?

Quote:
Originally Posted by JohnH View Post
The caveat is that considering additional kernels increases the complexity of \mathcal H and thus requires larger data sets to mitigate the risk of overfitting. It is possible that multiple kernels could be applied with the output of each being aggregated to produce the final model.
If you do not know much about the data and you are using svm, RBF is a good kernel to start with, you may select some other kernels (linear..)and/or parameters. If you have a reasonable number of kernel alternatives, you may use cross validation to select the kernel that produces the smallest E_cv. CV to select a kernel (among diff options) can be used with a limited amount of data.
The lowest E_cv, should be a good measurement for generalization for the selected kernel.
Reply With Quote
  #5  
Old 09-16-2012, 05:30 PM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: Which kernel to use?

Quote:
Originally Posted by rainbow View Post
In the course we have applied the gaussian, polynomial and linear kernel on different problems and learned how to tune them wrt. regularization to avoid overfitting.

- For a given problem, it seems like different kernels return different number of support vectors (although with zero training error). Since the generalization ability of the SVM model depends very much on the number of support vectors. Is the actual choice of kernel a "parameter to be tuned" as well?

- Is the choice of kernel application specific, data specific?

- Any rule of thumb?
Yes, choosing the kernel is like choosing an algorithm/model/hypotheses set and is important for SVMs. The validation techniques discussed in the class can be helpful for making the choice.

The "best" kernel can be data specific. On the other hand, there are several properties of popular kernels that can serve as rule-of-thumb:

* Gaussian-RBF kernel: suitable first-hand choice for general nonlinear learning (classification) --- fewer parameters than polynomial kernel and numerically more stable, with a wide range of fitting power (but requires a careful tuning).

S. S. Keerthi and C.-J. Lin. Asymptotic behaviors of support vector machines with Gaussian kernel . Neural Computation, 15(2003), 1667-1689.

C.-W. Hsu, C.-C. Chang, C.-J. Lin. A practical guide to support vector classification . Technical report, Department of Computer Science, National Taiwan University. July, 2003.

* Perceptron kernel: similar to Gaussian in performance but with fewer parameters (only C needs to be tuned).

Hsuan-Tien Lin and Ling Li. Support Vector Machinery for Infinite Ensemble Learning. Journal of Machine Learning Research, 9(2), 285-312, 2008.

* Linear "kernel": suitable when # feature >> # example, which may suggest that going non-linear is not needed. In that case, there are ultra fast solvers (such as LIBLINEAR) than general dual solvers (LIBSVM).

G.-X. Yuan, C.-H. Ho, and C.-J. Lin. Recent Advances of Large-scale Linear Classification. To appear in Proceedings of IEEE, 2012.

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification . Journal of Machine Learning Research 9(2008), 1871-1874.

An extension is low-order polynomial "kernel" that use the fast solvers, which can be competitive to Gaussian RBF but much faster (in training and testing)

Y.-W. Chang, C.-J. Hsieh, K.-W. Chang, M. Ringgaard, and C.-J. Lin. Training and Testing Low-degree Polynomial Data Mappings via Linear SVM. Journal of Machine Learning Research, 11(2010), 1471-1490.

Hope this helps.
__________________
When one teaches, two learn.
Reply With Quote
  #6  
Old 09-18-2012, 01:13 AM
rainbow rainbow is offline
Member
 
Join Date: Jul 2012
Posts: 41
Default Re: Which kernel to use?

Thanks you all for useful input. We have some reading to do... :-)
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 11:10 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.