LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 7

Reply
 
Thread Tools Display Modes
  #1  
Old 05-20-2013, 10:28 PM
Dorian Dorian is offline
Member
 
Join Date: Apr 2013
Posts: 11
Default Q9, SVM vs PLA

In Question 9 I would have expected naively that the more training points one has, the closer are SVM and PLA and thus a more "balanced" percentage of SVM being better than PLA.

I am saying this because with more training points you have less margin for the margin (sorry for the game of words). My program also concluded this but obviously something went wrong both with the program and my expectation

Why does the opposite happen, i.e. SVM approximates better the target function than PLA with more points?
Reply With Quote
  #2  
Old 05-20-2013, 10:48 PM
marek marek is offline
Member
 
Join Date: Apr 2013
Posts: 31
Default Re: Q9, SVM vs PLA

Quote:
Originally Posted by Dorian View Post
In Question 9 I would have expected naively that the more training points one has, the closer are SVM and PLA and thus a more "balanced" percentage of SVM being better than PLA.

I am saying this because with more training points you have less margin for the margin (sorry for the game of words). My program also concluded this but obviously something went wrong both with the program and my expectation

Why does the opposite happen, i.e. SVM approximates better the target function than PLA with more points?
From my numbers, the performance of SVM vs PLA didn't change very much at all between N = 10 and N = 100. Granted I'm not sure my program is functioning properly since I know I have a few small bugs with the QP.
Reply With Quote
  #3  
Old 05-20-2013, 11:16 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Q9, SVM vs PLA

Quote:
Originally Posted by Dorian View Post
In Question 9 I would have expected naively that the more training points one has, the closer are SVM and PLA and thus a more "balanced" percentage of SVM being better than PLA.
Without commenting directly on whether the percentage would go up or down or stay the same, let me just address the quoted point. The fact that there is less room for improvement doesn't necessarily relate to how often SVM would beat PLA, since the percentage reflects being better regardless of how much better it is.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #4  
Old 05-20-2013, 11:22 PM
Dorian Dorian is offline
Member
 
Join Date: Apr 2013
Posts: 11
Default Re: Q9, SVM vs PLA

ok, thinking more about it, maybe this happens because SVM generalizes better as it has a better effective dVC. Still looking for that bug in my code
Reply With Quote
  #5  
Old 05-21-2013, 06:51 PM
catherine catherine is offline
Member
 
Join Date: Apr 2013
Posts: 18
Default Re: Q9, SVM vs PLA

Same thing here. I used ipop from the kernlab package in R. I checked Ein and b, they behave as expected, and I'm getting the expected number of support vectors. I also plotted the results for one iteration, they match the figures I'm getting. Still the performance of my SVM model is only marginally better than the performance of a perceptron-based model, especially for N = 100.

Here are the results I'm getting:
For N = 10: SVM fares better than PLA for 63.9 % of the iterations . |EoutSVM| = 0.09050551, where as |EoutPLA| = 0.1221962
For N = 100: Even though for 56.9% of the iterations SVM fares better than PLA, |EoutSVM| = 0.01877277, where as |EoutPLA| = 0.01374174

In a way these results (I mean the fact that PLA catches up on SVM the larger the training set is) match my expectations - though I'm a bit disappointed about the SVM's lack of flamboyance in this particular case - is this because this is completely random data? They don't match the answer key though, according to which the SVM's overall performance as compared to PLA improves with the number of items in the training set.

Note: Not sure this is relevant - I'm using a test set of 1,000 data points.
Reply With Quote
  #6  
Old 05-22-2013, 03:56 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Q9, SVM vs PLA

You can think of it like this. How big are the sets of misclassified points in the two experiments? How many of your 1000 points are misclassified on average? How accurate an estimate do you think you are getting for each of the misclassified sets?

Actually it's worse than if you want to estimate the misclassification error for one method, as if M_{PLA} and M_{SVM} are the two sets of misclassified points, you are only interested in the points that are in one set but not the other.

Note: if you have a fraction f of a set that you are trying to estimate and you use N sample points, it's not difficult to calculate the standard deviation on such an estimate, which you can use to get a very good handle on how reliable your estimates and conclusions are.
Reply With Quote
  #7  
Old 05-22-2013, 07:23 AM
jlaurentum jlaurentum is offline
Member
 
Join Date: Apr 2013
Location: Venezuela
Posts: 41
Default Re: Q9, SVM vs PLA

@Catherine:

I also attempted to use ipop in the R kernlab package. I was having issues with the upper u constraint bounding the alphas. Depending on the u value I used, I'd get more volatility on the differences in the b values (I mean the bias b term in the weights). As many in other threads have pointed out, you never get any alphas equal to zero, just really low values on the order of <10^(-5). No matter if I calculated the weight vector summing up over all alphas or just wiping out those alphas close to zero, my bias terms were not equal when I solved for the support vectors. What really rang the alarm bells though was that the Ein error rate for the proposed solution obtained through the quadratic programming routine was never zero. Furthermore, sometimes the ipop returned with errors.

So I opted for using the ksvm function in the same package to obtain the support vector solutions and thereafter usign predict to calculate the out of sample error rate (with a large test data set). The ksvm function always returned an insample error of zero but, although I got question 8 correct using it, I failed to get questions 9 and 10 correctly.

Could you indicate how you got the ipop function to work? what parameters did you feed it? Did u use "vanilladot" as the kernel function for the H matrix?
Reply With Quote
  #8  
Old 05-22-2013, 08:19 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Q9, SVM vs PLA

I managed to get ipop to work with vanilladot. Took me a while before I was completely confident in the results though.

Did you plot your solutions? This is what made me confident I had it spot on. [You can do a pretty good job of SVM with 100 points in 2 dimensions using a ruler (straightedge)]
Did you manage to keep the sampling errors low enough as hinted at in my last post?
Reply With Quote
  #9  
Old 05-22-2013, 11:53 AM
jlaurentum jlaurentum is offline
Member
 
Join Date: Apr 2013
Location: Venezuela
Posts: 41
Default Re: Q9, SVM vs PLA

Elroch:

I didn't plot the solutions obtained by ipop because on seeing that the insample error was not zero, that invalidated everything for me. What parameters did you use for ipop?
Reply With Quote
  #10  
Old 05-22-2013, 01:16 PM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Q9, SVM vs PLA

Quote:
Originally Posted by jlaurentum View Post
Elroch:
I didn't plot the solutions obtained by ipop because on seeing that the insample error was not zero, that invalidated everything for me. What parameters did you use for ipop?
You may have seen a clue from the plot. I recall it helped me.

Essentially, I used the recipe in the R documentation page for ipop, except after a bit of experimentation I changed the value of H to kernelPol(vanilladot(),x,,y) and played about with the cost (I'm still not sure about that - anyone able to clarify?)

I should point out that (possibly due to not being at all familiar with ipop) I wrote a chunk of code to get the hypothesis needed for comparison. Basically it constructed the weight vector from the alphas and the support vectors as described in the lecture, calculated the values of the dot products on all of the support vectors, and then adjusted the first parameter of the weight vector so that the zero line was right in the middle of the support vectors. The main help of visualisation was seeing that the right points were support vectors. I am guessing there is probably a way to do this more directly (by doing the dual?)
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 11:40 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.