View Single Post
Old 03-04-2013, 11:24 AM
yaser's Avatar
yaser yaser is offline
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,478
Default Re: How many support vectors is too many?

Originally Posted by alternate View Post
Mentioning the VC dimension brings up something I considered briefly.

It's been said that when we make decisions based on seeing the data we should account for all of the options we considered when we're thinking about generalization. For example in the extreme case of data snooping, or in the lesser case where we should account for the fact that cross-validation adds a little bit of contamination.

But what about, say, a "failed" SVM? For example, we try the SVM hypothesis and get back 500 support vectors out of 1000, then decide to change the model because the first one won't generalize.

Realistically, if I then go to a different kernel or a neural network or something else, it doesn't care whether it was run before or after another model, it will produce the same result. But I could also see the interpretation where the SVM model counts as a space I explored in a similar way to tweaking parameters based on data. To what degree is that the case? Presumably there would be a tradeoff of accepting the weak model vs. accepting weaker generalization if any, which I guess could probably be automated, too.
There is a theoretical approach in Structural Risk Minimization that accounts for the snooping effect when we use a hierarchy of models.
Where everyone thinks alike, no one thinks very much
Reply With Quote