SVMs, kernels and logistic regression
This course has been great for getting us to think about issues about the application of machine learning techniques. Sometimes this has led to realisations, other times to unresolved questions! I'd like to mention a few for open discussion.
SVMs were what got me here in a sense. After enthusiastic experimentation with neural nets over the years, I got the impression in the last year that SVMs were something even better. I found my way to LIBSVM and while I managed to get some encouraging results, I also came to the conclusion that I didn't really know what I was doing, and needed to learn. I was right. [I am sure there is no connection with the fact that most of my silly errors in this course seem to have been with things like forgetting to pass a parameter to LIBSVM or misreading the number of support vectors  my fault entirely!]
One issue I have with SVMs is a way in which what they typically do is different to what I at one time assumed. I thought there was a hidden linear component in the kernel which would allow an arbitrary linear rescaling before the application of the kernel function.
The reason I am interested in this idea is because it is far from clear in some applications that all dimensions are equal. Typically, they get rescaled depending on the range, and then the kernel gets applied with the assumption that the range in every dimension is of identical relevance to the way the function varies. It's certainly a reasonable default, but it is not difficult to imagine examples where it would be desired to have high resolution in some dimensions, and low resolution in others. So the idea of a kernel which combines a linear and gaussian transform to give more hypotheses is interesting. Of course it has the cost of greater power, but perhaps some form of regularization and crossvalidation would tame this.
Secondly, I have an intuitive feeling that the reason SVM gives good results may be because it is a computationally efficient approximation to what I have learnt is called kernel logistic regression. The concept of this occurred to me a while back, and it is good to see that some very smart people are working on it. I'll be very interested to see how the import vector machine concept develops, and whether it might even be somewhat superior to SVMs as a general tool. It's nice to have probabilistic interpretation and multiple class classification being intrinsic. [At present I suspect it mainly comes down to computational demands, with both methods giving very good learning  is this so?]
Any comments, especially from those familiar with the way the field is developing will be most welcome.
