![]() |
Q9, SVM vs PLA
In Question 9 I would have expected naively that the more training points one has, the closer are SVM and PLA and thus a more "balanced" percentage of SVM being better than PLA.
I am saying this because with more training points you have less margin for the margin (sorry for the game of words). My program also concluded this but obviously something went wrong both with the program and my expectation :) Why does the opposite happen, i.e. SVM approximates better the target function than PLA with more points? |
Re: Q9, SVM vs PLA
Quote:
|
Re: Q9, SVM vs PLA
Quote:
|
Re: Q9, SVM vs PLA
ok, thinking more about it, maybe this happens because SVM generalizes better as it has a better effective dVC. Still looking for that bug in my code :)
|
Re: Q9, SVM vs PLA
Same thing here. I used ipop from the kernlab package in R. I checked Ein and b, they behave as expected, and I'm getting the expected number of support vectors. I also plotted the results for one iteration, they match the figures I'm getting. Still the performance of my SVM model is only marginally better than the performance of a perceptron-based model, especially for N = 100.
Here are the results I'm getting: For N = 10: SVM fares better than PLA for 63.9 % of the iterations . |EoutSVM| = 0.09050551, where as |EoutPLA| = 0.1221962 For N = 100: Even though for 56.9% of the iterations SVM fares better than PLA, |EoutSVM| = 0.01877277, where as |EoutPLA| = 0.01374174 In a way these results (I mean the fact that PLA catches up on SVM the larger the training set is) match my expectations - though I'm a bit disappointed about the SVM's lack of flamboyance in this particular case - is this because this is completely random data? They don't match the answer key though, according to which the SVM's overall performance as compared to PLA improves with the number of items in the training set. :clueless: Note: Not sure this is relevant - I'm using a test set of 1,000 data points. |
Re: Q9, SVM vs PLA
@Catherine:
I also attempted to use ipop in the R kernlab package. I was having issues with the upper u constraint bounding the alphas. Depending on the u value I used, I'd get more volatility on the differences in the b values (I mean the bias b term in the weights). As many in other threads have pointed out, you never get any alphas equal to zero, just really low values on the order of <10^(-5). No matter if I calculated the weight vector summing up over all alphas or just wiping out those alphas close to zero, my bias terms were not equal when I solved for the support vectors. What really rang the alarm bells though was that the Ein error rate for the proposed solution obtained through the quadratic programming routine was never zero. Furthermore, sometimes the ipop returned with errors. So I opted for using the ksvm function in the same package to obtain the support vector solutions and thereafter usign predict to calculate the out of sample error rate (with a large test data set). The ksvm function always returned an insample error of zero but, although I got question 8 correct using it, I failed to get questions 9 and 10 correctly. Could you indicate how you got the ipop function to work? what parameters did you feed it? Did u use "vanilladot" as the kernel function for the H matrix? |
Re: Q9, SVM vs PLA
I managed to get ipop to work with vanilladot. Took me a while before I was completely confident in the results though.
Did you plot your solutions? This is what made me confident I had it spot on. [You can do a pretty good job of SVM with 100 points in 2 dimensions using a ruler (straightedge)] Did you manage to keep the sampling errors low enough as hinted at in my last post? |
Re: Q9, SVM vs PLA
Elroch:
I didn't plot the solutions obtained by ipop because on seeing that the insample error was not zero, that invalidated everything for me. What parameters did you use for ipop? |
Re: Q9, SVM vs PLA
Quote:
Essentially, I used the recipe in the R documentation page for ipop, except after a bit of experimentation I changed the value of H to kernelPol(vanilladot(),x,,y) and played about with the cost (I'm still not sure about that - anyone able to clarify?) I should point out that (possibly due to not being at all familiar with ipop) I wrote a chunk of code to get the hypothesis needed for comparison. Basically it constructed the weight vector from the alphas and the support vectors as described in the lecture, calculated the values of the dot products on all of the support vectors, and then adjusted the first parameter of the weight vector so that the zero line was right in the middle of the support vectors. The main help of visualisation was seeing that the right points were support vectors. I am guessing there is probably a way to do this more directly (by doing the dual?) |
All times are GMT -7. The time now is 08:34 AM. |
Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.