LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 8 (http://book.caltech.edu/bookforum/forumdisplay.php?f=137)
-   -   on the right track? (http://book.caltech.edu/bookforum/showthread.php?t=4044)

Sendai 02-27-2013 06:57 PM

on the right track?
 
I thought it would be nice to have a way to check if we're on the right track with problems 2-5 without giving away the answers. I ran SVM (with the polynomial kernel) for a couple of cases and pasted the results below. Are others getting the same numbers?

0 vs 7 classifier, C=0.01, Q=2
number of support vectors = 861
E_{in} = 0.071778
E_{out} = 0.063241

2 vs 8 classifier, C=0.1, Q=3
number of support vectors = 721
E_{in} = 0.234878
E_{out} = 0.291209

Anjoola 02-28-2013 12:00 AM

Re: on the right track?
 
Hi, I got ALMOST the same numbers, except I got 860 support vectors for the first one instead of 861, and as a result my E_in is slightly different than yours. How did you choose your support vectors? Did you just check for \alpha > 0 or \ge or ?

hemphill 02-28-2013 11:25 AM

Re: on the right track?
 
I got exactly the same figures as the original poster. I'm using libsvm with the C programming language.

Sendai 02-28-2013 12:59 PM

Re: on the right track?
 
Quote:

Originally Posted by Anjoola (Post 9587)
How did you choose your support vectors? Did you just check for \alpha > 0 or \ge or ?

I'm using libsvm via scikit-learn and Python, and it takes care of all of that for you.

For the previous week's homework, I looked for alpha greater than 10^{-5}.

Since we're all getting basically the same numbers, I have more confidence that I'm doing it right.

ivankeller 02-28-2013 02:02 PM

Re: on the right track?
 
Thanks Sendai, That was a good idea.
I'm using scikit-learn too, a pretty nice python module.

Your results helped me to figure out that I needed to set the parameters gamma and coef0 in sklearn.svm.SVC(...) to 1. These parameters don't appear in the lecture. Now I've got the same results.

ilya239 03-01-2013 03:54 PM

Re: on the right track?
 
Quote:

Originally Posted by Sendai (Post 9581)
I thought it would be nice to have a way to check if we're on the right track with problems 2-5 without giving away the answers. I ran SVM (with the polynomial kernel) for a couple of cases and pasted the results below. Are others getting the same numbers?

0 vs 7 classifier, C=0.01, Q=2
number of support vectors = 861
E_{in} = 0.071778
E_{out} = 0.063241

2 vs 8 classifier, C=0.1, Q=3
number of support vectors = 721
E_{in} = 0.234878
E_{out} = 0.291209

Got similar numbers with python and cvxopt. As another check, got four margin support vectors for 0 vs 7, six for 2 vs 8.

Sendai 03-02-2013 04:02 PM

Re: on the right track?
 
Quote:

Originally Posted by ilya239 (Post 9630)
Got similar numbers with python and cvxopt. As another check, got four margin support vectors for 0 vs 7, six for 2 vs 8.

I get three and five respectively using libsvm via Python and scikit-learn.

Suhas Patil 03-03-2013 08:27 AM

Re: on the right track?
 
I'm trying libsvm through C, with following parameters:

param.svm_type = C_SVC;
param.kernel_type = POLY;
param.degree = 2;
param.gamma = 1;
param.coef0 = 1;
param.nu = 0.5;
param.cache_size = 200;
param.C = 0.01;
param.eps = 1e-3;
param.p = 0.1;
param.shrinking = 1;
param.probability = 0;
param.nr_weight = 0;
param.weight_label = NULL;
param.weight = NULL;

but getting Ein as 0.350 with 0 versus 7 classification. Also unable to find good explaination of these parameters anywhere. Any help?

Thanks in advance.

Suhas Patil 03-03-2013 09:23 AM

Re: on the right track?
 
I found the issue...thanks for reply from buttterscotch. The problem was with the way I was initializing 'svm_node' structure after reading the training data.

butterscotch 03-03-2013 09:25 AM

Re: on the right track?
 
Seems good to me. Are you getting the same number of support vectors with Sendai's post? You might want to verify how you calculate the error. The sv_coefficients are not just "alpha", but "y*alpha"

Suhas Patil 03-03-2013 09:38 AM

Re: on the right track?
 
I verified for 0 versus 7 case and I am getting exactly same number of support vectors. (and also Ein and Eout)
Haven't explored using sv coefficients for calculating the error. I am using this API:
double svm_predict(const struct svm_model *model, const struct svm_node *x); from 'svm.h'
that returns the predicted class value. I am calling this method in loop for all the test points and comparing against the ground truth (y array from svm_problem) to compute Ein or Eout.

Anne Paulson 03-04-2013 09:25 AM

Re: on the right track?
 
Thanks, thanks, thanks!

Like Ivan Keller, I at first wasn't setting the gamma and coef0 parameters.

I know what gamma is for the radial kernel, but what does it mean for the polynomial kernel? And what is coef0? The bias? If so, why would the default be 0? Wouldn't you usually want an intercept?

ilya239 03-04-2013 09:33 AM

Re: on the right track?
 
Quote:

Originally Posted by Anne Paulson (Post 9701)
I know what gamma is for the radial kernel, but what does it mean for the polynomial kernel? And what is coef0? The bias? If so, why would the default be 0? Wouldn't you usually want an intercept?

see http://scikit-learn.org/stable/modul...rnel-functions

also, from the python docs:
|
| kernel : string, optional (default='rbf')
| Specifies the kernel type to be used in the algorithm.
| It must be one of 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed' or
| a callable.
| If none is given, 'rbf' will be used. If a callable is given it is
| used to precompute the kernel matrix.
|
| degree : int, optional (default=3)
| Degree of kernel function.
| It is significant only in 'poly' and 'sigmoid'.
|
| gamma : float, optional (default=0.0)
| Kernel coefficient for 'rbf' and 'poly'.
| If gamma is 0.0 then 1/n_features will be used instead.
|
| coef0 : float, optional (default=0.0)
| Independent term in kernel function.
| It is only significant in 'poly' and 'sigmoid'.

kartikeya_t@yahoo.com 03-04-2013 10:08 AM

Re: on the right track?
 
Quote:

Originally Posted by butterscotch (Post 9679)
Seems good to me. Are you getting the same number of support vectors with Sendai's post? You might want to verify how you calculate the error. The sv_coefficients are not just "alpha", but "y*alpha"

Thanks ButterScotch for pointing this out about the sv coefficients. I have been looking at getting the errors in the test data without relying on the svm-predict function. Once I run the training using svm-train, I take the resulting model file and extract the support vectors and their coefficients, taking care that the coefficients are "y*alpha".
If my understanding is correct, the support vectors are some points from the input data set (in particular, the points that are "supporting" the decision boundary.)
So I expect that the support vectors that are being reported in the model file should be found in the raw training data. But for some reason, I do not see that. None of the support vectors that the package calculates are in the raw data.
Am I missing something? How would one go about constructing the final hypothesis from the support vectors and coefficients that are reported in the model file?

Anne Paulson 03-04-2013 10:20 AM

Re: on the right track?
 
Now I have a different problem (sorry to bug you all, thanks for your help). I'm getting the right (or at least, the same) results as the rest of you. But now I can't get answers to Q5 and Q6. I'm getting more than one statement being true, and numbers are not increasing/decreasing monotonically.

Suggestions? Hints?

Anne Paulson 03-04-2013 10:24 AM

Re: on the right track?
 
Never mind: "goes down" is to be interpreted as "goes down monotonically".

alternate 03-04-2013 10:46 AM

Re: on the right track?
 
As per another thread, when it says it goes up or goes down, it means it goes strictly, not monotonically.

Anne Paulson 03-04-2013 10:59 AM

Re: on the right track?
 
Right. Strictly, that's what I meant to say.

boulis 03-05-2013 06:07 AM

Re: on the right track?
 
Quote:

Originally Posted by Sendai (Post 9581)
I thought it would be nice to have a way to check if we're on the right track with problems 2-5 without giving away the answers. I ran SVM (with the polynomial kernel) for a couple of cases and pasted the results below. Are others getting the same numbers?

0 vs 7 classifier, C=0.01, Q=2
number of support vectors = 861
E_{in} = 0.071778
E_{out} = 0.063241

2 vs 8 classifier, C=0.1, Q=3
number of support vectors = 721
E_{in} = 0.234878
E_{out} = 0.291209

Very good idea. I am using LIBSVM with Python. I got the exact same results, with only slight difference that the numSV for second case was 722.

Code:

0 vs 7, Q=2, C=0.01 => Ein: 0.0717781402936 SV#: 861 Eout: 0.0632411067194
2 vs 8, Q=3, C=0.1 => Ein: 0.234878240377 SV#: 722 Eout: 0.291208791209


alasdairj 05-24-2013 04:25 AM

Re: on the right track?
 
Quote:

Originally Posted by Suhas Patil (Post 9678)
I found the issue...thanks for reply from buttterscotch. The problem was with the way I was initializing 'svm_node' structure after reading the training data.

I too am getting Ein of 0.35 for 0-to-7 classification. What is this "sum_node" structure Suhas mentions?

marek 05-26-2013 12:52 PM

Re: on the right track?
 
Quote:

Originally Posted by Anne Paulson (Post 9704)
Now I have a different problem (sorry to bug you all, thanks for your help). I'm getting the right (or at least, the same) results as the rest of you. But now I can't get answers to Q5 and Q6. I'm getting more than one statement being true, and numbers are not increasing/decreasing monotonically.

Suggestions? Hints?

I managed to reproduce the results earlier in this thread so I also had some confidence until I hit Q5 and Q6.

As I try to figure out what's going wrong, I guess I have one initial question. What are we supposed to do with -h, should we leave it at 1 as by default? -h 0 has no impact on the earlier questions but dramatically changes my answers for Q5 and Q6... and also takes incredibly long to compute.

Also regardless of which setting I choose, I always get the warning for hitting the max number of iterations... Any clues as to why that is or how I can prevent that?

Edit: Nevermind, after hours of trying to figure it out, minutes after I make a post I discover I had fat fingered -d 22 instead of -d 2. However, I am still curious as to what the effect of -h is if anyone knows.

dlammerts 05-26-2013 04:29 PM

Re: on the right track?
 
Quote:

Originally Posted by Sendai (Post 9581)
I thought it would be nice to have a way to check if we're on the right track with problems 2-5 without giving away the answers. I ran SVM (with the polynomial kernel) for a couple of cases and pasted the results below. Are others getting the same numbers?

0 vs 7 classifier, C=0.01, Q=2
number of support vectors = 861
E_{in} = 0.071778
E_{out} = 0.063241

2 vs 8 classifier, C=0.1, Q=3
number of support vectors = 721
E_{in} = 0.234878
E_{out} = 0.291209

Great idea. Got (almost) identical results using svm from the e1071 R plugin:

SVM model for 0-vs-7 classification with C = 0.01 and Q = 2: SVs = 861 Ein = 0.07177814 Eout = 0.06324111
SVM model for 2-vs-8 classification with C = 0.1 and Q = 3: SVs = 722 Ein = 0.2348782 Eout = 0.2912088

mvellon 05-27-2013 08:17 PM

Re: on the right track?
 
I'm stuck half-way in this problem. I'm trying to use the C# version of libsvm and, I think it's working, but I can't corroborate the numbers I'm seeing here. Actually, I match on the # of support vectors, but my Ein and Eout numbers are significantly different.

For 0 vs. 7 with Q=2 and C=.01 I get 861 SVs but using Sign(svm_predict) and counting sign mismatches I get:

Ein=.060 and
Eout=.057

Looking at 2 vs 8 with C=.1 and Q=3, I get 721 SVs but errors are much worse:

Ein=.67
Eout=.63

Since I'm getting the right number of support vectors, I think things are somewhat ok, but I'm perplexed regarding the results from svm_predict.

One dumb question: I presume the right way to feed data into libsvm (using its data file reading capabilities) is to manually subset the data as well as to prep it for libsvm format. Is this correct? When processing 2 vs 8, for example, I'll generate a +1 for "2" data, a -1 for "8" data and then discard the rest. Is this the right approach?

mvellon 05-27-2013 09:11 PM

Re: on the right track?
 
Hmm - running command-line versions of libsvm are corroborating numbers posted by others. I suspect the (mjohnson) .NET version has problems.

Elroch 05-28-2013 03:45 AM

Re: on the right track?
 
Quote:

Originally Posted by mvellon (Post 10967)
I'm stuck half-way in this problem. I'm trying to use the C# version of libsvm and, I think it's working, but I can't corroborate the numbers I'm seeing here. Actually, I match on the # of support vectors, but my Ein and Eout numbers are significantly different.

For 0 vs. 7 with Q=2 and C=.01 I get 861 SVs but using Sign(svm_predict) and counting sign mismatches I get:

Ein=.060 and
Eout=.057

Looking at 2 vs 8 with C=.1 and Q=3, I get 721 SVs but errors are much worse:

Ein=.67
Eout=.63

Since I'm getting the right number of support vectors, I think things are somewhat ok, but I'm perplexed regarding the results from svm_predict.

One dumb question: I presume the right way to feed data into libsvm (using its data file reading capabilities) is to manually subset the data as well as to prep it for libsvm format. Is this correct? When processing 2 vs 8, for example, I'll generate a +1 for "2" data, a -1 for "8" data and then discard the rest. Is this the right approach?

Yes, you need to extract just the data for those two digits, and your outputs are as instructed in the assignment.

Since several alternative interfaces to LIBSVM have got similar results (I used the R interface through the e1071 package myself), you might consider trying a different interface, if there is one that you could use in limited time. Other than that, the combination of right looking support vector count and wrong looking errors (behaving spectacularly different in the two test runs) is difficult to explain by something you have done.

chiraz 05-28-2013 05:29 AM

Re: on the right track?
 
I'm getting two possible cases (answers) for Q5. Randomisation did not help. Anyone have the same problem?

Not sure if there are any parameters to be tweaked that could help separate the cases...

mluser 05-28-2013 06:49 AM

Re: on the right track?
 
Was able to verify my numbers thanks to the original post. Got exactly the same results using libsvm with octave :D

Elroch 05-28-2013 07:40 AM

Re: on the right track?
 
Quote:

Originally Posted by chiraz (Post 10970)
I'm getting two possible cases (answers) for Q5. Randomisation did not help. Anyone have the same problem?

Not sure if there are any parameters to be tweaked that could help separate the cases...

It's safe to say exactly one of (a) to (e) is correct, as otherwise the question would have been fixed by now. It is not the case that people are generally finding two of the answers to be correct. Checking your individual results should get you there.

[EDIT: checking posts on the previous page, around #17, might also be helpful]

chiraz 05-28-2013 09:56 AM

Re: on the right track?
 
Thanks for the hint!

chiraz 05-28-2013 10:02 AM

Re: on the right track?
 
so just what I thought: it boils down to interpreting "decreasing" as "strictly decreasing". C'mon, isn't that silly now

jlaurentum 05-28-2013 10:44 AM

Re: on the right track?
 
I used the kernlab package in R, sepcifically, the ksvm function in that package. I had no problems whatsoever with the functions- their use was straightforward and the package was able to handle that dataset size (set the scaled=FALSE parameter and you're good).

@Elroch: could you explain how to setup libsvm through the e1071 package interface?

Elroch 05-28-2013 11:47 AM

Re: on the right track?
 
Doesn't this work for you: http://www.csie.ntu.edu.tw/~cjlin/libsvm/R_example.html ?

khohi 03-04-2016 06:08 AM

Re: on the right track?
 
Great Job :D

فوائد عصير الليمون


All times are GMT -7. The time now is 07:55 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.