LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 7 (http://book.caltech.edu/bookforum/forumdisplay.php?f=136)
-   -   Q9, SVM vs PLA (http://book.caltech.edu/bookforum/showthread.php?t=4301)

 catherine 05-23-2013 02:28 AM

Re: Q9, SVM vs PLA

@jlaurentum:
These are the parameters I fed to ipop:
Code:

```H = sweep(XIn[,2:3],MARGIN=1,yIn, '*') c = matrix(rep(-1,n)) A = t(yIn) b = 0 l = matrix(rep(0,n)) u = matrix(rep(1e7,n)) r = 0 sv = ipop(c,H,A,b,l,u,r)```
I'm not sure why but I had to tweak u to get a 0 Ein across the board, ie for all 1,000 iterations. I have to admit that the technicalities of quadratic programming go a tad over my head :/

 Elroch 05-23-2013 02:55 AM

Re: Q9, SVM vs PLA

Quote:
 Originally Posted by catherine (Post 10929) @jlaurentum: These are the parameters I fed to ipop: Code: ```H = sweep(XIn[,2:3],MARGIN=1,yIn, '*') c = matrix(rep(-1,n)) A = t(yIn) b = 0 l = matrix(rep(0,n)) u = matrix(rep(1e7,n)) r = 0 sv = ipop(c,H,A,b,l,u,r)``` I'm not sure why but I had to tweak u to get a 0 Ein across the board, ie for all 1,000 iterations. I have to admit that the technicalities of quadratic programming go a tad over my head :/
I just came to this forum to discuss this same matter. My last post was confusing, probably because I was a bit confused!

The parameter is just a vector of upper bounds for inequalities, but our problem only has lower bounds. I wanted to use a vector of Infs, but ipop didn't like that, so I just played around to find a value for that would work. For some reason I found extremely large values gave errors, but large ones (like the one you used worked fine). I don't know why either. As you probably realised, all you need to check is that none of the alphas attains the upper bound you use. If that is the case, the upper bounds have had no effect.

How did you arrive at your choice of ?

 catherine 05-23-2013 05:57 AM

Re: Q9, SVM vs PLA

Quote:
 Originally Posted by Elroch (Post 10907) You can think of it like this. How big are the sets of misclassified points in the two experiments? How many of your 1000 points are misclassified on average? How accurate an estimate do you think you are getting for each of the misclassified sets? Actually it's worse than if you want to estimate the misclassification error for one method, as if and are the two sets of misclassified points, you are only interested in the points that are in one set but not the other. Note: if you have a fraction of a set that you are trying to estimate and you use N sample points, it's not difficult to calculate the standard deviation on such an estimate, which you can use to get a very good handle on how reliable your estimates and conclusions are.
Hi Elroch, from your comment above I understand that my test set was too small. How large should it be? How did you go about estimating the 'disagreement' between the target function and the final PLA / SVM hypotheses? According to the HW instructions, this 'disagreement' can be either calculated exactly or approximated by generating a sufficiently large set of points to evaluate it. How would you one about calculating it exactly?

 catherine 05-23-2013 06:04 AM

Re: Q9, SVM vs PLA

Quote:
 Originally Posted by Elroch (Post 10930) How did you arrive at your choice of ?
I just followed slide 15 (The solution - quadratic programming) and the documentation of the kernlab package.

 Elroch 05-23-2013 06:22 AM

Re: Q9, SVM vs PLA

Quote:
 Originally Posted by catherine (Post 10931) Hi Elroch, from your comment above I understand that my test set was too small. How large should it be? How did you go about estimating the 'disagreement' between the target function and the final PLA / SVM hypotheses? According to the HW instructions, this 'disagreement' can be either calculated exactly or approximated by generating a sufficiently large set of points to evaluate it. How would you one about calculating it exactly?
Calculating it exactly involves doing some fiddly geometry to determine the area between two lines. The fiddliness is due to the fact that the lines can cross any of the sides of the square (it would be easier if the dataset was a circle, or if we knew that the crossing point was near the center of the dataset (when the angle between them would be enough)). I had a look at calculating it in an earlier homework, but decided it wasn't worth the bother.

More straightforward is to make the sample big enough. 1000 is a long way short of what you need, because all except 10-20 of those points are accurately classified by both algorithms.

The uncertainty in estimates is quite apparent. Suppose you have a method and want to estimate its accuracy. In a number of runs you find an average of 10 of 1000 random points are misclassified. Each point is a perfectly random sample from a distribution which has about 1% of one value and 99% of the other. In a single run, there is huge uncertainty on this estimate: getting 5 or 15 misclassified points is going to happen. Because this is happening with the misclassified points for each of the two methods, the uncertainty in the difference between them is even larger.

The consequence is that the advantage of the better method appears a lot less when the sample is small, because of this noise in the estimates dominates a rather delicate signal.

Hence I used 100,000 random points, so that the number of misclassified points for each method was a lot more stable. Empirically, this gave quite repeatable results. The uncertainty in the misclassification error of each of the two algorithms can be estimated separately by doing a moderate number of repeat runs (eg with 10000 each) and looking at the range of values found. You can then even combine the runs together and infer a good estimate of the uncertainty on the combined run (based on the variance of the estimate being inversely proportional to the number of samples).

[could you give a link about the documentation you mentioned? I can't find a reference to "sweep" in the documentation I used at http://cran.r-project.org/web/packag...ab/kernlab.pdf and I don't quite see what this is doing from the R documentation of this function.]

 jlaurentum 05-23-2013 08:23 AM

Re: Q9, SVM vs PLA

Hello Christine:

I tried your code using the sweep function (which is totally mysterious to me and so like Elroch, I'd like to ask how you arrived at this function). I got the following error message (using r in spanish):

Code:

```Error en sweep(x[, 2:3], MARGIN = 1, y, "*") :   subíndice fuera de  los límites Ejecución interrumpida```
In other words, subindex out of bounds.
So I tried my version of the H matrix:

Code:

`H <-  kernelPol(vanilladot(),x,,y)`
where x and y are the input matrix and output vector respectively. This is what I got:

Code:

```Error en solve.default(AP, c(c.x, c.y)) :   sistema es computacionalmente singular: número de condición recíproco = 1.92544e-16 Calls: ipop -> ipop -> solve -> solve.default Ejecución interrumpida```
Hmmm... So a computationally singular matrix. I played around with the u vector of upper constraints like so:

Code:

`u <- matrix(rep(1e3,N))`
And it somehow went through (sometimes). However, apart from the fact that the results were complete nonsense the in sample error was invariably non zero. As Elroch remarked, I also had the intuitive realization that if the upper matrix was not high enough, some alpha values were always reaching the upper bound (in this case 1e3=1000), and that did not seem right....

Ahh... quadratic programming and its mysteries!

That's why I gave up on ipop altogether and decided to use ksvm:

Code:

```        x <- as.matrix(training_data[,2:3]) #pull out x_0         y <- as.matrix(training_data[,4])         svmmodel <- ksvm(x,y,kernel="vanilladot",C=100,type="C-svc")```
But my answers obtained in Q9 and Q10 were not correct.

 catherine 05-24-2013 01:16 AM

Re: Q9, SVM vs PLA

Hi guys,

Sorry for the confusion:

1. The X matrix is my code excerpt above includes x0 (I used the same matrix for PLA), so leave out the index sub-setting if you are using a separate matrix for SVM.

2. sweep(XIn[,2:3], MARGIN=1, yIn, '*') is the same as apply(XIn[,2:3], 2, function(x) {x * yIn} )

3. Here is the kernlab documentation I used: http://cran.r-project.org/web/packag...ab/kernlab.pdf

 Elroch 05-24-2013 08:26 AM

Re: Q9, SVM vs PLA

Thanks Catherine. Does that make sense about nature of the errors due to sample size?

All times are GMT -7. The time now is 10:16 AM.