#1




Q9, SVM vs PLA
In Question 9 I would have expected naively that the more training points one has, the closer are SVM and PLA and thus a more "balanced" percentage of SVM being better than PLA.
I am saying this because with more training points you have less margin for the margin (sorry for the game of words). My program also concluded this but obviously something went wrong both with the program and my expectation Why does the opposite happen, i.e. SVM approximates better the target function than PLA with more points? 
#2




Re: Q9, SVM vs PLA
Quote:

#3




Re: Q9, SVM vs PLA
Without commenting directly on whether the percentage would go up or down or stay the same, let me just address the quoted point. The fact that there is less room for improvement doesn't necessarily relate to how often SVM would beat PLA, since the percentage reflects being better regardless of how much better it is.
__________________
Where everyone thinks alike, no one thinks very much 
#4




Re: Q9, SVM vs PLA
ok, thinking more about it, maybe this happens because SVM generalizes better as it has a better effective dVC. Still looking for that bug in my code

#5




Re: Q9, SVM vs PLA
Same thing here. I used ipop from the kernlab package in R. I checked Ein and b, they behave as expected, and I'm getting the expected number of support vectors. I also plotted the results for one iteration, they match the figures I'm getting. Still the performance of my SVM model is only marginally better than the performance of a perceptronbased model, especially for N = 100.
Here are the results I'm getting: For N = 10: SVM fares better than PLA for 63.9 % of the iterations . EoutSVM = 0.09050551, where as EoutPLA = 0.1221962 For N = 100: Even though for 56.9% of the iterations SVM fares better than PLA, EoutSVM = 0.01877277, where as EoutPLA = 0.01374174 In a way these results (I mean the fact that PLA catches up on SVM the larger the training set is) match my expectations  though I'm a bit disappointed about the SVM's lack of flamboyance in this particular case  is this because this is completely random data? They don't match the answer key though, according to which the SVM's overall performance as compared to PLA improves with the number of items in the training set. Note: Not sure this is relevant  I'm using a test set of 1,000 data points. 
#6




Re: Q9, SVM vs PLA
You can think of it like this. How big are the sets of misclassified points in the two experiments? How many of your 1000 points are misclassified on average? How accurate an estimate do you think you are getting for each of the misclassified sets?
Actually it's worse than if you want to estimate the misclassification error for one method, as if and are the two sets of misclassified points, you are only interested in the points that are in one set but not the other. Note: if you have a fraction of a set that you are trying to estimate and you use N sample points, it's not difficult to calculate the standard deviation on such an estimate, which you can use to get a very good handle on how reliable your estimates and conclusions are. 
#7




Re: Q9, SVM vs PLA
@Catherine:
I also attempted to use ipop in the R kernlab package. I was having issues with the upper u constraint bounding the alphas. Depending on the u value I used, I'd get more volatility on the differences in the b values (I mean the bias b term in the weights). As many in other threads have pointed out, you never get any alphas equal to zero, just really low values on the order of <10^(5). No matter if I calculated the weight vector summing up over all alphas or just wiping out those alphas close to zero, my bias terms were not equal when I solved for the support vectors. What really rang the alarm bells though was that the Ein error rate for the proposed solution obtained through the quadratic programming routine was never zero. Furthermore, sometimes the ipop returned with errors. So I opted for using the ksvm function in the same package to obtain the support vector solutions and thereafter usign predict to calculate the out of sample error rate (with a large test data set). The ksvm function always returned an insample error of zero but, although I got question 8 correct using it, I failed to get questions 9 and 10 correctly. Could you indicate how you got the ipop function to work? what parameters did you feed it? Did u use "vanilladot" as the kernel function for the H matrix? 
#8




Re: Q9, SVM vs PLA
I managed to get ipop to work with vanilladot. Took me a while before I was completely confident in the results though.
Did you plot your solutions? This is what made me confident I had it spot on. [You can do a pretty good job of SVM with 100 points in 2 dimensions using a ruler (straightedge)] Did you manage to keep the sampling errors low enough as hinted at in my last post? 
#9




Re: Q9, SVM vs PLA
Elroch:
I didn't plot the solutions obtained by ipop because on seeing that the insample error was not zero, that invalidated everything for me. What parameters did you use for ipop? 
#10




Re: Q9, SVM vs PLA
Quote:
Essentially, I used the recipe in the R documentation page for ipop, except after a bit of experimentation I changed the value of H to kernelPol(vanilladot(),x,,y) and played about with the cost (I'm still not sure about that  anyone able to clarify?) I should point out that (possibly due to not being at all familiar with ipop) I wrote a chunk of code to get the hypothesis needed for comparison. Basically it constructed the weight vector from the alphas and the support vectors as described in the lecture, calculated the values of the dot products on all of the support vectors, and then adjusted the first parameter of the weight vector so that the zero line was right in the middle of the support vectors. The main help of visualisation was seeing that the right points were support vectors. I am guessing there is probably a way to do this more directly (by doing the dual?) 
Thread Tools  
Display Modes  

