LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 7 (http://book.caltech.edu/bookforum/forumdisplay.php?f=136)
-   -   Q9, SVM vs PLA (http://book.caltech.edu/bookforum/showthread.php?t=4301)

Dorian 05-20-2013 09:28 PM

Q9, SVM vs PLA
 
In Question 9 I would have expected naively that the more training points one has, the closer are SVM and PLA and thus a more "balanced" percentage of SVM being better than PLA.

I am saying this because with more training points you have less margin for the margin (sorry for the game of words). My program also concluded this but obviously something went wrong both with the program and my expectation :)

Why does the opposite happen, i.e. SVM approximates better the target function than PLA with more points?

marek 05-20-2013 09:48 PM

Re: Q9, SVM vs PLA
 
Quote:

Originally Posted by Dorian (Post 10895)
In Question 9 I would have expected naively that the more training points one has, the closer are SVM and PLA and thus a more "balanced" percentage of SVM being better than PLA.

I am saying this because with more training points you have less margin for the margin (sorry for the game of words). My program also concluded this but obviously something went wrong both with the program and my expectation :)

Why does the opposite happen, i.e. SVM approximates better the target function than PLA with more points?

From my numbers, the performance of SVM vs PLA didn't change very much at all between N = 10 and N = 100. Granted I'm not sure my program is functioning properly since I know I have a few small bugs with the QP.

yaser 05-20-2013 10:16 PM

Re: Q9, SVM vs PLA
 
Quote:

Originally Posted by Dorian (Post 10895)
In Question 9 I would have expected naively that the more training points one has, the closer are SVM and PLA and thus a more "balanced" percentage of SVM being better than PLA.

Without commenting directly on whether the percentage would go up or down or stay the same, let me just address the quoted point. The fact that there is less room for improvement doesn't necessarily relate to how often SVM would beat PLA, since the percentage reflects being better regardless of how much better it is.

Dorian 05-20-2013 10:22 PM

Re: Q9, SVM vs PLA
 
ok, thinking more about it, maybe this happens because SVM generalizes better as it has a better effective dVC. Still looking for that bug in my code :)

catherine 05-21-2013 05:51 PM

Re: Q9, SVM vs PLA
 
Same thing here. I used ipop from the kernlab package in R. I checked Ein and b, they behave as expected, and I'm getting the expected number of support vectors. I also plotted the results for one iteration, they match the figures I'm getting. Still the performance of my SVM model is only marginally better than the performance of a perceptron-based model, especially for N = 100.

Here are the results I'm getting:
For N = 10: SVM fares better than PLA for 63.9 % of the iterations . |EoutSVM| = 0.09050551, where as |EoutPLA| = 0.1221962
For N = 100: Even though for 56.9% of the iterations SVM fares better than PLA, |EoutSVM| = 0.01877277, where as |EoutPLA| = 0.01374174

In a way these results (I mean the fact that PLA catches up on SVM the larger the training set is) match my expectations - though I'm a bit disappointed about the SVM's lack of flamboyance in this particular case - is this because this is completely random data? They don't match the answer key though, according to which the SVM's overall performance as compared to PLA improves with the number of items in the training set. :clueless:

Note: Not sure this is relevant - I'm using a test set of 1,000 data points.

Elroch 05-22-2013 02:56 AM

Re: Q9, SVM vs PLA
 
You can think of it like this. How big are the sets of misclassified points in the two experiments? How many of your 1000 points are misclassified on average? How accurate an estimate do you think you are getting for each of the misclassified sets?

Actually it's worse than if you want to estimate the misclassification error for one method, as if M_{PLA} and M_{SVM} are the two sets of misclassified points, you are only interested in the points that are in one set but not the other.

Note: if you have a fraction f of a set that you are trying to estimate and you use N sample points, it's not difficult to calculate the standard deviation on such an estimate, which you can use to get a very good handle on how reliable your estimates and conclusions are.

jlaurentum 05-22-2013 06:23 AM

Re: Q9, SVM vs PLA
 
@Catherine:

I also attempted to use ipop in the R kernlab package. I was having issues with the upper u constraint bounding the alphas. Depending on the u value I used, I'd get more volatility on the differences in the b values (I mean the bias b term in the weights). As many in other threads have pointed out, you never get any alphas equal to zero, just really low values on the order of <10^(-5). No matter if I calculated the weight vector summing up over all alphas or just wiping out those alphas close to zero, my bias terms were not equal when I solved for the support vectors. What really rang the alarm bells though was that the Ein error rate for the proposed solution obtained through the quadratic programming routine was never zero. Furthermore, sometimes the ipop returned with errors.

So I opted for using the ksvm function in the same package to obtain the support vector solutions and thereafter usign predict to calculate the out of sample error rate (with a large test data set). The ksvm function always returned an insample error of zero but, although I got question 8 correct using it, I failed to get questions 9 and 10 correctly.

Could you indicate how you got the ipop function to work? what parameters did you feed it? Did u use "vanilladot" as the kernel function for the H matrix?

Elroch 05-22-2013 07:19 AM

Re: Q9, SVM vs PLA
 
I managed to get ipop to work with vanilladot. Took me a while before I was completely confident in the results though.

Did you plot your solutions? This is what made me confident I had it spot on. [You can do a pretty good job of SVM with 100 points in 2 dimensions using a ruler (straightedge)]
Did you manage to keep the sampling errors low enough as hinted at in my last post?

jlaurentum 05-22-2013 10:53 AM

Re: Q9, SVM vs PLA
 
Elroch:

I didn't plot the solutions obtained by ipop because on seeing that the insample error was not zero, that invalidated everything for me. What parameters did you use for ipop?

Elroch 05-22-2013 12:16 PM

Re: Q9, SVM vs PLA
 
Quote:

Originally Posted by jlaurentum (Post 10918)
Elroch:
I didn't plot the solutions obtained by ipop because on seeing that the insample error was not zero, that invalidated everything for me. What parameters did you use for ipop?

You may have seen a clue from the plot. I recall it helped me.

Essentially, I used the recipe in the R documentation page for ipop, except after a bit of experimentation I changed the value of H to kernelPol(vanilladot(),x,,y) and played about with the cost (I'm still not sure about that - anyone able to clarify?)

I should point out that (possibly due to not being at all familiar with ipop) I wrote a chunk of code to get the hypothesis needed for comparison. Basically it constructed the weight vector from the alphas and the support vectors as described in the lecture, calculated the values of the dot products on all of the support vectors, and then adjusted the first parameter of the weight vector so that the zero line was right in the middle of the support vectors. The main help of visualisation was seeing that the right points were support vectors. I am guessing there is probably a way to do this more directly (by doing the dual?)

catherine 05-23-2013 02:28 AM

Re: Q9, SVM vs PLA
 
@jlaurentum:
These are the parameters I fed to ipop:
Code:

H = sweep(XIn[,2:3],MARGIN=1,yIn, '*')
c = matrix(rep(-1,n))
A = t(yIn)
b = 0
l = matrix(rep(0,n))
u = matrix(rep(1e7,n))
r = 0
sv = ipop(c,H,A,b,l,u,r)

I'm not sure why but I had to tweak u to get a 0 Ein across the board, ie for all 1,000 iterations. I have to admit that the technicalities of quadratic programming go a tad over my head :/

Elroch 05-23-2013 02:55 AM

Re: Q9, SVM vs PLA
 
Quote:

Originally Posted by catherine (Post 10929)
@jlaurentum:
These are the parameters I fed to ipop:
Code:

H = sweep(XIn[,2:3],MARGIN=1,yIn, '*')
c = matrix(rep(-1,n))
A = t(yIn)
b = 0
l = matrix(rep(0,n))
u = matrix(rep(1e7,n))
r = 0
sv = ipop(c,H,A,b,l,u,r)

I'm not sure why but I had to tweak u to get a 0 Ein across the board, ie for all 1,000 iterations. I have to admit that the technicalities of quadratic programming go a tad over my head :/

I just came to this forum to discuss this same matter. My last post was confusing, probably because I was a bit confused!

The u parameter is just a vector of upper bounds for inequalities, but our problem only has lower bounds. I wanted to use a vector of Infs, but ipop didn't like that, so I just played around to find a value for c that would work. For some reason I found extremely large values gave errors, but large ones (like the one you used worked fine). I don't know why either. As you probably realised, all you need to check is that none of the alphas attains the upper bound you use. If that is the case, the upper bounds have had no effect.

How did you arrive at your choice of H?

catherine 05-23-2013 05:57 AM

Re: Q9, SVM vs PLA
 
Quote:

Originally Posted by Elroch (Post 10907)
You can think of it like this. How big are the sets of misclassified points in the two experiments? How many of your 1000 points are misclassified on average? How accurate an estimate do you think you are getting for each of the misclassified sets?

Actually it's worse than if you want to estimate the misclassification error for one method, as if M_{PLA} and M_{SVM} are the two sets of misclassified points, you are only interested in the points that are in one set but not the other.

Note: if you have a fraction f of a set that you are trying to estimate and you use N sample points, it's not difficult to calculate the standard deviation on such an estimate, which you can use to get a very good handle on how reliable your estimates and conclusions are.

Hi Elroch, from your comment above I understand that my test set was too small. How large should it be? How did you go about estimating the 'disagreement' between the target function and the final PLA / SVM hypotheses? According to the HW instructions, this 'disagreement' can be either calculated exactly or approximated by generating a sufficiently large set of points to evaluate it. How would you one about calculating it exactly?

catherine 05-23-2013 06:04 AM

Re: Q9, SVM vs PLA
 
Quote:

Originally Posted by Elroch (Post 10930)

How did you arrive at your choice of H?

I just followed slide 15 (The solution - quadratic programming) and the documentation of the kernlab package.

Elroch 05-23-2013 06:22 AM

Re: Q9, SVM vs PLA
 
Quote:

Originally Posted by catherine (Post 10931)
Hi Elroch, from your comment above I understand that my test set was too small. How large should it be? How did you go about estimating the 'disagreement' between the target function and the final PLA / SVM hypotheses? According to the HW instructions, this 'disagreement' can be either calculated exactly or approximated by generating a sufficiently large set of points to evaluate it. How would you one about calculating it exactly?

Calculating it exactly involves doing some fiddly geometry to determine the area between two lines. The fiddliness is due to the fact that the lines can cross any of the sides of the square (it would be easier if the dataset was a circle, or if we knew that the crossing point was near the center of the dataset (when the angle between them would be enough)). I had a look at calculating it in an earlier homework, but decided it wasn't worth the bother.

More straightforward is to make the sample big enough. 1000 is a long way short of what you need, because all except 10-20 of those points are accurately classified by both algorithms.

The uncertainty in estimates is quite apparent. Suppose you have a method and want to estimate its accuracy. In a number of runs you find an average of 10 of 1000 random points are misclassified. Each point is a perfectly random sample from a distribution which has about 1% of one value and 99% of the other. In a single run, there is huge uncertainty on this estimate: getting 5 or 15 misclassified points is going to happen. Because this is happening with the misclassified points for each of the two methods, the uncertainty in the difference between them is even larger.

The consequence is that the advantage of the better method appears a lot less when the sample is small, because of this noise in the estimates dominates a rather delicate signal.

Hence I used 100,000 random points, so that the number of misclassified points for each method was a lot more stable. Empirically, this gave quite repeatable results. The uncertainty in the misclassification error of each of the two algorithms can be estimated separately by doing a moderate number of repeat runs (eg with 10000 each) and looking at the range of values found. You can then even combine the runs together and infer a good estimate of the uncertainty on the combined run (based on the variance of the estimate being inversely proportional to the number of samples).

[could you give a link about the documentation you mentioned? I can't find a reference to "sweep" in the documentation I used at http://cran.r-project.org/web/packag...ab/kernlab.pdf and I don't quite see what this is doing from the R documentation of this function.]

jlaurentum 05-23-2013 08:23 AM

Re: Q9, SVM vs PLA
 
Hello Christine:

I tried your code using the sweep function (which is totally mysterious to me and so like Elroch, I'd like to ask how you arrived at this function). I got the following error message (using r in spanish):

Code:

Error en sweep(x[, 2:3], MARGIN = 1, y, "*") :
  subíndice fuera de  los límites
Ejecución interrumpida

In other words, subindex out of bounds.
So I tried my version of the H matrix:

Code:

H <-  kernelPol(vanilladot(),x,,y)
where x and y are the input matrix and output vector respectively. This is what I got:

Code:

Error en solve.default(AP, c(c.x, c.y)) :
  sistema es computacionalmente singular: número de condición recíproco = 1.92544e-16
Calls: ipop -> ipop -> solve -> solve.default
Ejecución interrumpida

Hmmm... So a computationally singular matrix. I played around with the u vector of upper constraints like so:

Code:

u <- matrix(rep(1e3,N))
And it somehow went through (sometimes). However, apart from the fact that the results were complete nonsense the in sample error was invariably non zero. As Elroch remarked, I also had the intuitive realization that if the upper matrix was not high enough, some alpha values were always reaching the upper bound (in this case 1e3=1000), and that did not seem right....

Ahh... quadratic programming and its mysteries!

That's why I gave up on ipop altogether and decided to use ksvm:

Code:

        x <- as.matrix(training_data[,2:3]) #pull out x_0
        y <- as.matrix(training_data[,4])
        svmmodel <- ksvm(x,y,kernel="vanilladot",C=100,type="C-svc")

But my answers obtained in Q9 and Q10 were not correct.

catherine 05-24-2013 01:16 AM

Re: Q9, SVM vs PLA
 
Hi guys,

Sorry for the confusion:

1. The X matrix is my code excerpt above includes x0 (I used the same matrix for PLA), so leave out the index sub-setting if you are using a separate matrix for SVM.

2. sweep(XIn[,2:3], MARGIN=1, yIn, '*') is the same as apply(XIn[,2:3], 2, function(x) {x * yIn} )

3. Here is the kernlab documentation I used: http://cran.r-project.org/web/packag...ab/kernlab.pdf

Elroch 05-24-2013 08:26 AM

Re: Q9, SVM vs PLA
 
Thanks Catherine. Does that make sense about nature of the errors due to sample size?


All times are GMT -7. The time now is 08:54 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.