View Single Post
Old 06-12-2013, 03:32 PM
htlin's Avatar
htlin htlin is offline
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: SVMs, kernels and logistic regression

Originally Posted by Elroch View Post
One issue I have with SVMs is a way in which what they typically do is different to what I at one time assumed. I thought there was a hidden linear component in the kernel which would allow an arbitrary linear rescaling before the application of the kernel function.

The reason I am interested in this idea is because it is far from clear in some applications that all dimensions are equal. Typically, they get rescaled depending on the range, and then the kernel gets applied with the assumption that the range in every dimension is of identical relevance to the way the function varies. It's certainly a reasonable default, but it is not difficult to imagine examples where it would be desired to have high resolution in some dimensions, and low resolution in others. So the idea of a kernel which combines a linear and gaussian transform to give more hypotheses is interesting. Of course it has the cost of greater power, but perhaps some form of regularization and cross-validation would tame this.
There is a rich literature of ongoing works on multiple-kernel learning (MKL) that may match your thoughts here. MKL learns a convex combination of kernels which equivalently rescales some of the transforms hidden under the kernels. From my limited experience, it is very difficult to control the greater power in MKL, though.

Originally Posted by Elroch View Post
Secondly, I have an intuitive feeling that the reason SVM gives good results may be because it is a computationally efficient approximation to what I have learnt is called kernel logistic regression. The concept of this occurred to me a while back, and it is good to see that some very smart people are working on it. I'll be very interested to see how the import vector machine concept develops, and whether it might even be somewhat superior to SVMs as a general tool. It's nice to have probabilistic interpretation and multiple class classification being intrinsic. [At present I suspect it mainly comes down to computational demands, with both methods giving very good learning - is this so?]
From my experience, SVM, KLR and other related approaches indeed lead to similar performance *if well-tuned*. I don't think any approach is particularly "superior" to others in terms of practical performance. The "if well-tuned" is a pretty big assumption, though. In the past ten years the development of the many tools, along with the efforts in making large-scale training computationally feasible, makes it easier to tune SVM than the other approaches. That may be one important reason for the success of SVM.

Hope this helps.
When one teaches, two learn.
Reply With Quote