Should SVMs ALWAYS converge to the same solution given the same data?
Should SVMs ALWAYS converge to the same solution given the same data?
I've been running a few tests on Q7 and I find that if I randomize the order of the data in both the training and test sets, I get different solutions/errors? I am now wondering whether I should be averaging over say a hundred runs and whether I need to go back to previous questions and do the same where necessary? Argh (if so)!? Please can somebody confirm? Maybe my "shuffling" code is incorrect but looks correct to me: Code:
trainingData = ReadCaltechFile('features.train'); 
Re: Should SVMs ALWAYS converge to the same solution given the same data?
Hi
I also found the same problem when I used 10fold with shuffle. See the following: http://book.caltech.edu/bookforum/showthread.php?t=1282 My guess is that if you have data with reasonably well separated features ("good data" in relation to our model), we should get the same result (or more or less the same results) if we have the same data that is processed in different sequences. But if we have a "difficult data", the classification margins may be very narrow and if we handle the data in different sequences we may get different results. I think that is the case with "one against all" in q5 where I do not get "repeatable results". I am skipping shuffle. What is the purpose for randomizing the data in this case? 
Re: Should SVMs ALWAYS converge to the same solution given the same data?
In answer to my original question I found this:
http://compbio.soe.ucsc.edu/genex/ge...tml/node3.html which says: In addition to counteracting overfitting, the SVM's use of the maximum margin hyperplane leads to a straightforward learning algorithm that can be reduced to a convex optimization problem. In order to train the system, the SVM must find the unique minimum of a convex function. Unlike the backpropagation learning algorithm for artificial neural networks, a given SVM will always deterministically converge to the same solution for a given data set, regardless of the initial conditions. For training sets containing less than approximately 5000 points, gradient descent provides an efficient solution to this optimization problem [Campbell and Cristianini, 1999]. So the random results I am seeing seem to oppose the above? Also mentioned at above link which may explain this is: The selection of an appropriate kernel function is important, since the kernel function defines the feature space in which the training set examples will be classified. As long as the kernel function is legitimate, an SVM will operate correctly even if the designer does not know exactly what features of the training data are being used in the kernelinduced feature space. The definition of a legitimate kernel function is given by Mercer's theorem [Vapnik, 1998]: the function must be continuous and positive definite. Human experts often find it easier to specify a kernel function than to specify explicitly the training set features that should be used by the classifier. The kernel expresses prior knowledge about the phenomenon being modeled, encoded as a similarity measure between two vectors. Maybe then I can propose that the polynomial kernel as supplied for the homework doesn't represent the feature space well enough and is the reason for the variation in my results? One last point. I tried using RBF in Q7 instead, and though there were still discrepancies from run to run, they were much much smaller! But before I get carried away, please could somebody confirm they see different results when they shuffle their data sets prior to learning? 
Re: Should SVMs ALWAYS converge to the same solution given the same data?
Thanks Andrs, (sorry I started composing the above before seeing your post). I can only think it is the choice of kernel as I saw this problem in Q7 where it was classifiers 1 v 5. Would be great to hear from one of the Professors on this (please)?

Re: Should SVMs ALWAYS converge to the same solution given the same data?
I see different cross validation errors, but the models always give the same Ein and Eout. There is variation in the svmtrain output, different iterations, values of obj, rho, and even nBSV change, but the models always seem to work the same with svmpredict.

Re: Should SVMs ALWAYS converge to the same solution given the same data?
Quote:
Keith can you confirm that you shuffled your data first? I can understand why the output could be considerably different when cross validation is applied due to what data gets put into each fold. However for a straight learn and predict I hadn't anticipated a "considerable" difference in Ein and Eout when the data is the same but in a different order. 
Re: Should SVMs ALWAYS converge to the same solution given the same data?
Yes I shuffled, ran svmtrain and svmpredict and then shuffled again and repeated many times. I saw NO change in Ein or Eout.

Re: Should SVMs ALWAYS converge to the same solution given the same data?
I have looked in more detail, and it is the larger values of C where the discrepancies occur. I.e., when C >= 100... this may make more sense as from what I remember from the lectures, the higher the value of C, the more you tend towards a "hard" margin, which means the classifier will be more sensitive to noise in the data.
Keith that is unusual (maybe you were using low values of C in your tests)? I did a quick google, this explains it well: http://stackoverflow.com/questions/4...rsoftmargins I will put it down to "sensitivity to noise" cause by high C (harder margins), unless otherwise corrected? For clarity here are my results over 3 runs of Q7 when the power (Q) is set to 5. Each run has a shuffled training and test set: C_________ NoOfSVs____ Eout_______ Ein 1.0000e002 2.3000e+001 2.1226e002 3.8437e003 1.0000e+000 2.1000e+001 2.1226e002 3.2031e003 1.0000e+002 1.3000e+001 2.1226e002 3.8437e003 1.0000e+004 1.1000e+001 3.0660e002 1.7937e002 1.0000e+006 9.0000e+000 6.3679e002 5.1249e002 C_________ NoOfSVs____ Eout_______ Ein 1.0000e002 2.3000e+001 2.1226e002 3.8437e003 1.0000e+000 2.1000e+001 2.1226e002 3.2031e003 1.0000e+002 1.3000e+001 2.3585e002 4.4843e003 1.0000e+004 9.0000e+000 3.3491e001 3.3312e001 1.0000e+006 9.0000e+000 3.3726e001 3.4465e001 C_________ NoOfSVs____ Eout_______ Ein 1.0000e002 2.3000e+001 2.1226e002 3.8437e003 1.0000e+000 2.1000e+001 2.1226e002 3.2031e003 1.0000e+002 1.3000e+001 2.3585e002 3.2031e003 1.0000e+004 9.0000e+000 3.5377e001 3.4145e001 1.0000e+006 1.1000e+001 3.5142e001 3.5106e001 
Re: Should SVMs ALWAYS converge to the same solution given the same data?
This seems to be occurring when there is no convergence, maximum iterations reached, large Ein.

Re: Should SVMs ALWAYS converge to the same solution given the same data?
Mathematically, the primal problem of SVM enjoys a unique solution; the dual, on the other hand, may or may not come with a unique solution. In some degenerate cases, there can be multiple solutions of the dual (that correspond to the same solution of the primal).
Hope this helps. 
All times are GMT 7. The time now is 12:02 PM. 
Powered by vBulletin® Version 3.8.3
Copyright ©2000  2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. AbuMostafa, Malik MagdonIsmail, and HsuanTien Lin, and participants in the Learning From Data MOOC by Yaser S. AbuMostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.