Q9, SVM vs PLA
In Question 9 I would have expected naively that the more training points one has, the closer are SVM and PLA and thus a more "balanced" percentage of SVM being better than PLA.
I am saying this because with more training points you have less margin for the margin (sorry for the game of words). My program also concluded this but obviously something went wrong both with the program and my expectation :) Why does the opposite happen, i.e. SVM approximates better the target function than PLA with more points? 
Re: Q9, SVM vs PLA
Quote:

Re: Q9, SVM vs PLA
Quote:

Re: Q9, SVM vs PLA
ok, thinking more about it, maybe this happens because SVM generalizes better as it has a better effective dVC. Still looking for that bug in my code :)

Re: Q9, SVM vs PLA
Same thing here. I used ipop from the kernlab package in R. I checked Ein and b, they behave as expected, and I'm getting the expected number of support vectors. I also plotted the results for one iteration, they match the figures I'm getting. Still the performance of my SVM model is only marginally better than the performance of a perceptronbased model, especially for N = 100.
Here are the results I'm getting: For N = 10: SVM fares better than PLA for 63.9 % of the iterations . EoutSVM = 0.09050551, where as EoutPLA = 0.1221962 For N = 100: Even though for 56.9% of the iterations SVM fares better than PLA, EoutSVM = 0.01877277, where as EoutPLA = 0.01374174 In a way these results (I mean the fact that PLA catches up on SVM the larger the training set is) match my expectations  though I'm a bit disappointed about the SVM's lack of flamboyance in this particular case  is this because this is completely random data? They don't match the answer key though, according to which the SVM's overall performance as compared to PLA improves with the number of items in the training set. :clueless: Note: Not sure this is relevant  I'm using a test set of 1,000 data points. 
Re: Q9, SVM vs PLA
@Catherine:
I also attempted to use ipop in the R kernlab package. I was having issues with the upper u constraint bounding the alphas. Depending on the u value I used, I'd get more volatility on the differences in the b values (I mean the bias b term in the weights). As many in other threads have pointed out, you never get any alphas equal to zero, just really low values on the order of <10^(5). No matter if I calculated the weight vector summing up over all alphas or just wiping out those alphas close to zero, my bias terms were not equal when I solved for the support vectors. What really rang the alarm bells though was that the Ein error rate for the proposed solution obtained through the quadratic programming routine was never zero. Furthermore, sometimes the ipop returned with errors. So I opted for using the ksvm function in the same package to obtain the support vector solutions and thereafter usign predict to calculate the out of sample error rate (with a large test data set). The ksvm function always returned an insample error of zero but, although I got question 8 correct using it, I failed to get questions 9 and 10 correctly. Could you indicate how you got the ipop function to work? what parameters did you feed it? Did u use "vanilladot" as the kernel function for the H matrix? 
Re: Q9, SVM vs PLA
I managed to get ipop to work with vanilladot. Took me a while before I was completely confident in the results though.
Did you plot your solutions? This is what made me confident I had it spot on. [You can do a pretty good job of SVM with 100 points in 2 dimensions using a ruler (straightedge)] Did you manage to keep the sampling errors low enough as hinted at in my last post? 
Re: Q9, SVM vs PLA
Elroch:
I didn't plot the solutions obtained by ipop because on seeing that the insample error was not zero, that invalidated everything for me. What parameters did you use for ipop? 
Re: Q9, SVM vs PLA
Quote:
Essentially, I used the recipe in the R documentation page for ipop, except after a bit of experimentation I changed the value of H to kernelPol(vanilladot(),x,,y) and played about with the cost (I'm still not sure about that  anyone able to clarify?) I should point out that (possibly due to not being at all familiar with ipop) I wrote a chunk of code to get the hypothesis needed for comparison. Basically it constructed the weight vector from the alphas and the support vectors as described in the lecture, calculated the values of the dot products on all of the support vectors, and then adjusted the first parameter of the weight vector so that the zero line was right in the middle of the support vectors. The main help of visualisation was seeing that the right points were support vectors. I am guessing there is probably a way to do this more directly (by doing the dual?) 
Re: Q9, SVM vs PLA
@jlaurentum:
These are the parameters I fed to ipop: Code:
H = sweep(XIn[,2:3],MARGIN=1,yIn, '*') 
Re: Q9, SVM vs PLA
Quote:
The parameter is just a vector of upper bounds for inequalities, but our problem only has lower bounds. I wanted to use a vector of Infs, but ipop didn't like that, so I just played around to find a value for that would work. For some reason I found extremely large values gave errors, but large ones (like the one you used worked fine). I don't know why either. As you probably realised, all you need to check is that none of the alphas attains the upper bound you use. If that is the case, the upper bounds have had no effect. How did you arrive at your choice of ? 
Re: Q9, SVM vs PLA
Quote:

Re: Q9, SVM vs PLA

Re: Q9, SVM vs PLA
Quote:
More straightforward is to make the sample big enough. 1000 is a long way short of what you need, because all except 1020 of those points are accurately classified by both algorithms. The uncertainty in estimates is quite apparent. Suppose you have a method and want to estimate its accuracy. In a number of runs you find an average of 10 of 1000 random points are misclassified. Each point is a perfectly random sample from a distribution which has about 1% of one value and 99% of the other. In a single run, there is huge uncertainty on this estimate: getting 5 or 15 misclassified points is going to happen. Because this is happening with the misclassified points for each of the two methods, the uncertainty in the difference between them is even larger. The consequence is that the advantage of the better method appears a lot less when the sample is small, because of this noise in the estimates dominates a rather delicate signal. Hence I used 100,000 random points, so that the number of misclassified points for each method was a lot more stable. Empirically, this gave quite repeatable results. The uncertainty in the misclassification error of each of the two algorithms can be estimated separately by doing a moderate number of repeat runs (eg with 10000 each) and looking at the range of values found. You can then even combine the runs together and infer a good estimate of the uncertainty on the combined run (based on the variance of the estimate being inversely proportional to the number of samples). [could you give a link about the documentation you mentioned? I can't find a reference to "sweep" in the documentation I used at http://cran.rproject.org/web/packag...ab/kernlab.pdf and I don't quite see what this is doing from the R documentation of this function.] 
Re: Q9, SVM vs PLA
Hello Christine:
I tried your code using the sweep function (which is totally mysterious to me and so like Elroch, I'd like to ask how you arrived at this function). I got the following error message (using r in spanish): Code:
Error en sweep(x[, 2:3], MARGIN = 1, y, "*") : So I tried my version of the H matrix: Code:
H < kernelPol(vanilladot(),x,,y) Code:
Error en solve.default(AP, c(c.x, c.y)) : Code:
u < matrix(rep(1e3,N)) Ahh... quadratic programming and its mysteries! That's why I gave up on ipop altogether and decided to use ksvm: Code:
x < as.matrix(training_data[,2:3]) #pull out x_0 
Re: Q9, SVM vs PLA
Hi guys,
Sorry for the confusion: 1. The X matrix is my code excerpt above includes x0 (I used the same matrix for PLA), so leave out the index subsetting if you are using a separate matrix for SVM. 2. sweep(XIn[,2:3], MARGIN=1, yIn, '*') is the same as apply(XIn[,2:3], 2, function(x) {x * yIn} ) 3. Here is the kernlab documentation I used: http://cran.rproject.org/web/packag...ab/kernlab.pdf 
Re: Q9, SVM vs PLA
Thanks Catherine. Does that make sense about nature of the errors due to sample size?

All times are GMT 7. The time now is 08:54 AM. 
Powered by vBulletin® Version 3.8.3
Copyright ©2000  2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. AbuMostafa, Malik MagdonIsmail, and HsuanTien Lin, and participants in the Learning From Data MOOC by Yaser S. AbuMostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.