
#1




Support Vector Machines
I wonder if I'm missing something  I found this lecture to be by far the hardest to understand till now, for the following reasons:
1) I didn't see how it fit. Does it have something to do with the crossvalidation lecture that preceded it? There really wasn't much discussion of how good this method is, or what its Eout is (there's a tiny bit at the very end), or any of the discussion that I'm used to from the whole rest of the course. As a result, 2) I didn't really know what the point of SVM is. There was sort of a handwaving argument that it would be better to have fat margins, and then a mention that it allows very highdimensional models without paying the penalty  because you still hope to end up with just a few support vectors. I got the impression from the argument that the answer will be more robust, so I guess that it generalizes better. 3) I hadn't heard of quadratic programming before, and again that was taken for granted: after you do all this, just pass it to a quadratic programming package... I can look it up on wikipedia, but again disconcerting. 4) The starting point of the lecture was PLA with linearly separable points. But I'm assuming that that isn't required at all, because how useful could that be in general? Again, though, I wasn't clear on it; the simple clarity of the support vectors meaning the points that the margin "bumps into" doesn't work (near as I can see) where we have a nonseparable set of points. What determines how many support vectors you get? 5) This whole lecture isn't in the book at all, which also contributes to my difficulty, since I usually use the book as a second input. 6) Hypothesis: We have moved into the part of the course where we are looking at particular methods, and I missed the transition. Everyone but me knows all about SVM and how great it is, so that was taken for granted in the lecture. Not complaining, but I would like to get my bearings back. Comments? 
#2




Re: Support Vector Machines
Yes, SVM is simply a useful general learning algorithm of the sort we have seen in the PLA algorithm, logistic regression and linear regression (and even neural networks), but one which has advantages for many problems. The general idea of the algorithm seems intuitively comprehensible; the details may be more technical than algorithms for which we have every detail, but I'm not too concerned.
With a classification problem and separable sets, SVM can be thought of as starting where PLA ends. It doesn't just separate data, it separates it in the best possible way, ensuring (quantitatively) better generalisation. But as well as that, a modified version (using regularization) can be very good at dealing with nonseparable classification problems (I believe because it focuses on the critical subset of the data, not the subset that is no problem). Next week I believe we will learn about how SVMs retain these advantages for nonlinear classification (and regression) problems by the use of kernels. Another plus of SVMs is global optimums. NNs have local minima, so aren't guaranteed to get the global best answer, but this doesn't seem to cause many problems and NNs are also a genuinely useful tool. My limited experience tells me that crossvalidation proves very useful with SVMs and avoids the necessity for precise theoretical results about their ability to generalise. 
#3




Re: Support Vector Machines
I agree that this lecture (and to a lesser degree "Regularization"), is more difficult than previous ones. There is quite a bit of Math involved and, since not everything is proven, I had to take some (reasonable) leaps of faith.
The way I understood it, is that it is another Machine Learning method, but a "stateoftheart" one, one that has a natural intuition ("biggest margin") and that, more importantly, works remarkably well in practice. Regarding "Quadratic Programming", it is similar to "Linear Programming", which is nowadays relatively standard Algorithms material (It can be found in CLRS's "Introduction to Algorithms"). So, another leap of faith, I am not surprised that there are algorithms that can solve efficiently these optimization problems, and I feel OK using them as "blackboxes". Finally, regarding "Validation", the 2 lectures just happen to be in the same week, but unrelated, "Validation" being grouped with last week's lectures. Anyway, you are not the only one who finds the lecture challenging and who doesn't know beforehand what SVM's are 
#4




Re: Support Vector Machines
Thanks everyone for your replies; I feel better.
Now I have to do the homework, though, and using a quadratic programming package that I know little about is a little scary, especially the way the answers are entered. General question on the homework: Has anyone found a way to test one's answers on these programming assignments, prior to submitting? It's exciting to submit, but I could do with a little more certainty! I'm the lazy sort of programmer  usually I debug afterwards, by seeing if the answer is right. In this case, I guess the thing to do is graph the result: It should be obvious by eye (at least for the lowN case) if the weighting always falls "in the middle". 
#5




Re: Support Vector Machines
Quote:
I fed the SVM implemetation with just 2 points (N=2): (0, 0) and (1, 1) with respective values 1 and 1. Then I made sure I get as the answer the line/vector I expect: 1 + x + y = 0, that is (1, 1, 1) (Or something very close to that, as they are rounding errors. Also a positive multiple of that vector would do as it specifies the same line/function). Another thing I do is to always check, programmatically, that the vector/line I get from SVM really separates the N points I give as input (In other words I check that E_in = 0). If it's not the case, I abort the program, since this means there is some flaw in my implementation. 
#6




Re: Support Vector Machines
Thanks again.

#7




Re: Support Vector Machines
Quote:
SVM does not build on validation. It is a new classification algoithm for linearly separable data with one added twist: maximizing the margin. May I suggest that you look at the first 6 minutes of this segment again: If you are sold on the idea of maximizing the margin as something beneficial, what the rest of the lecture does is go through the mathematical and algorithmic machinery of maximizing the margin. After that, we do the usual; apply nonlinear transforms to generalize a linear method. I will be happy to discuss this further.
__________________
Where everyone thinks alike, no one thinks very much 
Thread Tools  
Display Modes  

