View Single Post
#3
 itooam Senior Member Join Date: Jul 2012 Posts: 100 Re: Should SVMs ALWAYS converge to the same solution given the same data?

In answer to my original question I found this:

http://compbio.soe.ucsc.edu/genex/ge...tml/node3.html

which says:

In addition to counteracting overfitting, the SVM's use of the maximum margin hyperplane leads to a straightforward learning algorithm that can be reduced to a convex optimization problem. In order to train the system, the SVM must find the unique minimum of a convex function. Unlike the backpropagation learning algorithm for artificial neural networks, a given SVM will always deterministically converge to the same solution for a given data set, regardless of the initial conditions. For training sets containing less than approximately 5000 points, gradient descent provides an efficient solution to this optimization problem [Campbell and Cristianini, 1999].

So the random results I am seeing seem to oppose the above? Also mentioned at above link which may explain this is:

The selection of an appropriate kernel function is important, since the kernel function defines the feature space in which the training set examples will be classified. As long as the kernel function is legitimate, an SVM will operate correctly even if the designer does not know exactly what features of the training data are being used in the kernel-induced feature space. The definition of a legitimate kernel function is given by Mercer's theorem [Vapnik, 1998]: the function must be continuous and positive definite. Human experts often find it easier to specify a kernel function than to specify explicitly the training set features that should be used by the classifier. The kernel expresses prior knowledge about the phenomenon being modeled, encoded as a similarity measure between two vectors.

Maybe then I can propose that the polynomial kernel as supplied for the homework doesn't represent the feature space well enough and is the reason for the variation in my results?

One last point. I tried using RBF in Q7 instead, and though there were still discrepancies from run to run, they were much much smaller!

But before I get carried away, please could somebody confirm they see different results when they shuffle their data sets prior to learning?