LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   The Final (http://book.caltech.edu/bookforum/forumdisplay.php?f=138)
-   -   Question 12 (http://book.caltech.edu/bookforum/showthread.php?t=1505)

Anton Khorev 09-12-2012 11:29 PM

Question 12
 
Looks like there's two answers for Q13. It's possible to get different number of support vectors with octave qp and libsvm.

yaser 09-13-2012 12:04 AM

Re: Question 13
 
Quote:

Originally Posted by Anton Khorev (Post 5215)
Looks like there's two answers for Q13. It's possible to get different number of support vectors with octave qp and libsvm.

Interesting. Is the hypothesis g identical?

MLearning 09-13-2012 10:40 AM

Re: Question 13
 
I think it has to do with the fact that qp ( and quadprog in MATLAB) provide alpha values that are negligbly small. By setting an appropriate threshold, it is possible to filter out these very small values.

In Homework 7, one of the students introduced a trick as means to go around the initialization problem in qp (or quadprod). When I applied this trick, qp and libsvm provide different number of SVs. However, when I initialize all alphas to a vector of zeros, libsvm and Octave's qp yield the same number of SVs.

Anton Khorev 09-13-2012 11:39 AM

Re: Question 13
 
In this problem vectors are placed symmetrically. In qp solution one of them touches the margin with alpha==0.

MLearning 09-13-2012 12:41 PM

Re: Question 13
 
Quote:

Originally Posted by Anton Khorev (Post 5236)
In this problem vectors are placed symmetrically. In qp solution one of them touches the margin with alpha==0.

Symmetric in X space, yes. How about in Z space, are they symmetric?

patrickjtierney 09-16-2012 03:35 PM

Re: Question 13
 
This is the only question I got wrong on the final, and I would have got it right if I used my libsvm version of the answer rather than my hand-built version with qp (all in Octave). My qp (wrong!) answer was one less support vector than I got with libsvm and that might only be because I used 10e-012 as a threshhold. (If I had omitted the threshhold I would have gotten the same number of sv's as in libsvm :shock:).

I got w = [-0.88889, 5.0e-016] and b = -1.6667 using qp, but strangely I get
w = [0.88869, 0] and b = 1.6663 using libsvm. They both have Ein=0 and on a thousand test runs of a million random points in [-3,3]^2 they agree on labels on average 99.999% of the cases. (For libsvm, I use svmpredict with all labels = +1 which is ~71% accurate :) to get the actual prediction labels.)

The difference in sign may not be significant. I got w and b for qp directly by following the class slides, but I got w = model.SVs'*model.sv_coef and b = - model.rho in the libsvm case (which may not be exactly correct).

The values of alpha (for qp) are different from model.sv_coef, and the qp version uses all but the last of the libsvm support vectors.

So I do agree that there may be 2 correct answers for this question, based on numerical issues and different ways qp and libsvm handle the calculations, but beyond the control of the student.

If required I can PM the alphas and the code I used to support the claim, or wait and post an **answer** after the deadline.

yaser 09-16-2012 06:16 PM

Re: Question 13
 
Quote:

Originally Posted by patrickjtierney (Post 5365)
This is the only question I got wrong on the final, and I would have got it right if I used my libsvm version of the answer rather than my hand-built version with qp (all in Octave). My qp (wrong!) answer was one less support vector than I got with libsvm and that might only be because I used 10e-012 as a threshhold. (If I had omitted the threshhold I would have gotten the same number of sv's as in libsvm :shock:).

Thank you for posting this. I have to look into it. The OP seems to have had a similar experience, and I was waiting for a reply to my previous post in this thread.

JohnH 09-16-2012 06:51 PM

Re: Question 13
 
My experience is the same. My intuition indicated the correct answer from the key, but my experiments using QP with Octave consistently gave an answer of one less than that identified by libsvm (even when comparing \alpha against a threshold of zero). After completing the final, I went back and tried some additional experiments and discovered that rearranging the order of the training data changed the number of support vectors.

yaser 09-16-2012 07:54 PM

Re: Question 13
 
Can you guys do the following: Perturb one of the SV's that are symmetric by a small amount, run your qp programs again, and see if the ambiguity goes away? I will do that myself but I just wanted more people with different packages to try as well. Thank you.

fgpancorbo 09-16-2012 08:15 PM

Re: Question 13
 
Quote:

Originally Posted by patrickjtierney (Post 5365)
This is the only question I got wrong on the final, and I would have got it right if I used my libsvm version of the answer rather than my hand-built version with qp (all in Octave). My qp (wrong!) answer was one less support vector than I got with libsvm and that might only be because I used 10e-012 as a threshhold. (If I had omitted the threshhold I would have gotten the same number of sv's as in libsvm :shock:).

I got w = [-0.88889, 5.0e-016] and b = -1.6667 using qp, but strangely I get
w = [0.88869, 0] and b = 1.6663 using libsvm. They both have Ein=0 and on a thousand test runs of a million random points in [-3,3]^2 they agree on labels on average 99.999% of the cases. (For libsvm, I use svmpredict with all labels = +1 which is ~71% accurate :) to get the actual prediction labels.)

The difference in sign may not be significant. I got w and b for qp directly by following the class slides, but I got w = model.SVs'*model.sv_coef and b = - model.rho in the libsvm case (which may not be exactly correct).

The values of alpha (for qp) are different from model.sv_coef, and the qp version uses all but the last of the libsvm support vectors.

So I do agree that there may be 2 correct answers for this question, based on numerical issues and different ways qp and libsvm handle the calculations, but beyond the control of the student.

If required I can PM the alphas and the code I used to support the claim, or wait and post an **answer** after the deadline.

I haven't submitted my answers yet, but on this one I used libsvm; regarding how to get w and b, I found this on the website http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f804 ,

Code:

w = model.SVs' * model.sv_coef;
b = -model.rho;

if model.Label(1) == -1
  w = -w;
  b = -b;
end

The difference with what you did are the last 3 lines. Note that this would be good for problem 12 only. I get a different w though from what you get: w = [2 0] b = 1. svmpredict also gives me Ein =0. My options for 12 were '-s 0 -t 0 -q -h 0 -c 1e10'. For problem 13, I used libsvm as well and I get an answer that is amongst those suggested.

patrickjtierney 09-16-2012 09:09 PM

Re: Question 13
 
Using x = [1 0;0 1;0 -1.00002; -1 0; 0 2.0001; 0 -2; -2 0]; I still get one less s.v. for qp than libsvm (ie same values I get without perturbing). This remains the case when only perturbing one s.v. The most notable change is that the second weight entry grows, although the first and b also change.

Also, thanks to fgpancorbo for the code for getting w & b from libsvm. Useful for the future.

MLearning 09-17-2012 07:47 AM

Re: Question 13
 
Quote:

Originally Posted by patrickjtierney (Post 5374)
Using x = [1 0;0 1;0 -1.00002; -1 0; 0 2.0001; 0 -2; -2 0]; I still get one less s.v. for qp than libsvm (ie same values I get without perturbing). This remains the case when only perturbing one s.v. The most notable change is that the second weight entry grows, although the first and b also change.

Also, thanks to fgpancorbo for the code for getting w & b from libsvm. Useful for the future.

@patrickjtierney,

In z space X6 and X7 map to the same point in z space, i.e, X6 (0, -2) and X7 (-2,0) map to (3,5). I wonder if this has any effect on the computation.

patrickjtierney 09-17-2012 10:26 AM

Re: Question 13
 
Quote:

Originally Posted by MLearning (Post 5415)
@patrickjtierney,

In z space X6 and X7 map to the same point in z space, i.e, X6 (0, -2) and X7 (-2,0) map to (3,5). I wonder if this has any effect on the computation.

I noticed that in Q12, but I believe that the z-space is defined by the polynomial kernel in Q13, and not the mapping from Q12. This can be seen in slide 10 of week 15 where g(x) is specified.

MCN12 09-17-2012 03:01 PM

Re: Question 13
 
Using Matlab libsvm I perturbed [0,-1] to [0,-0.94] and it reduced the number of support vectors by one. W and b agree with what others have seen for libsvm.

fgpancorbo 09-17-2012 06:32 PM

Re: Question 13
 
I used libsvm and got this question right.

Anton Khorev 09-17-2012 09:48 PM

Re: Question 13
 
Quote:

Originally Posted by yaser (Post 5217)
Interesting. Is the hypothesis g identical?

Predictions of qp and libsvm are identical (for 10000 uniformly distributed samples on x1,x2 = [-5,5]).

JohnH 09-17-2012 09:58 PM

Re: Question 13
 
As a quick (read minimal effort) check of model equivalence, I compared the predicted results of 1,000,000 randomly selected points within [-3,3][-3,3] using the support vectors from both Octave/QP and Python/libsvm. Only three points were classified differently despite the difference in the number of support vectors returned by the two approaches. I'm certain that an analytical comparison of the support vectors would prove their equivalence; however, it hardly seems necessary given the empirical results.

marco.lehmann 06-01-2013 07:32 AM

Re: Question 13
 
I had quite some problems with solving this problem and it forced me to play around with different approaches (qp, libsvm). One more approach to consider is this: Lecture 15, slide 5:
In the case of the kernel used in exercice 13, there is a corresponding transformation, given explicitly on that slide. So why not giving it a try?
I got some confidence in the result after reading the slides title:).


All times are GMT -7. The time now is 12:16 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.