LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 7

Reply
 
Thread Tools Display Modes
  #11  
Old 05-23-2013, 03:28 AM
catherine catherine is offline
Member
 
Join Date: Apr 2013
Posts: 18
Default Re: Q9, SVM vs PLA

@jlaurentum:
These are the parameters I fed to ipop:
Code:
H = sweep(XIn[,2:3],MARGIN=1,yIn, '*')
c = matrix(rep(-1,n))
A = t(yIn)
b = 0
l = matrix(rep(0,n))
u = matrix(rep(1e7,n))
r = 0
sv = ipop(c,H,A,b,l,u,r)
I'm not sure why but I had to tweak u to get a 0 Ein across the board, ie for all 1,000 iterations. I have to admit that the technicalities of quadratic programming go a tad over my head :/

Last edited by catherine; 05-23-2013 at 03:32 AM. Reason: more details
Reply With Quote
  #12  
Old 05-23-2013, 03:55 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Q9, SVM vs PLA

Quote:
Originally Posted by catherine View Post
@jlaurentum:
These are the parameters I fed to ipop:
Code:
H = sweep(XIn[,2:3],MARGIN=1,yIn, '*')
c = matrix(rep(-1,n))
A = t(yIn)
b = 0
l = matrix(rep(0,n))
u = matrix(rep(1e7,n))
r = 0
sv = ipop(c,H,A,b,l,u,r)
I'm not sure why but I had to tweak u to get a 0 Ein across the board, ie for all 1,000 iterations. I have to admit that the technicalities of quadratic programming go a tad over my head :/
I just came to this forum to discuss this same matter. My last post was confusing, probably because I was a bit confused!

The u parameter is just a vector of upper bounds for inequalities, but our problem only has lower bounds. I wanted to use a vector of Infs, but ipop didn't like that, so I just played around to find a value for c that would work. For some reason I found extremely large values gave errors, but large ones (like the one you used worked fine). I don't know why either. As you probably realised, all you need to check is that none of the alphas attains the upper bound you use. If that is the case, the upper bounds have had no effect.

How did you arrive at your choice of H?
Reply With Quote
  #13  
Old 05-23-2013, 06:57 AM
catherine catherine is offline
Member
 
Join Date: Apr 2013
Posts: 18
Default Re: Q9, SVM vs PLA

Quote:
Originally Posted by Elroch View Post
You can think of it like this. How big are the sets of misclassified points in the two experiments? How many of your 1000 points are misclassified on average? How accurate an estimate do you think you are getting for each of the misclassified sets?

Actually it's worse than if you want to estimate the misclassification error for one method, as if M_{PLA} and M_{SVM} are the two sets of misclassified points, you are only interested in the points that are in one set but not the other.

Note: if you have a fraction f of a set that you are trying to estimate and you use N sample points, it's not difficult to calculate the standard deviation on such an estimate, which you can use to get a very good handle on how reliable your estimates and conclusions are.
Hi Elroch, from your comment above I understand that my test set was too small. How large should it be? How did you go about estimating the 'disagreement' between the target function and the final PLA / SVM hypotheses? According to the HW instructions, this 'disagreement' can be either calculated exactly or approximated by generating a sufficiently large set of points to evaluate it. How would you one about calculating it exactly?
Reply With Quote
  #14  
Old 05-23-2013, 07:04 AM
catherine catherine is offline
Member
 
Join Date: Apr 2013
Posts: 18
Default Re: Q9, SVM vs PLA

Quote:
Originally Posted by Elroch View Post

How did you arrive at your choice of H?
I just followed slide 15 (The solution - quadratic programming) and the documentation of the kernlab package.
Reply With Quote
  #15  
Old 05-23-2013, 07:22 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Q9, SVM vs PLA

Quote:
Originally Posted by catherine View Post
Hi Elroch, from your comment above I understand that my test set was too small. How large should it be? How did you go about estimating the 'disagreement' between the target function and the final PLA / SVM hypotheses? According to the HW instructions, this 'disagreement' can be either calculated exactly or approximated by generating a sufficiently large set of points to evaluate it. How would you one about calculating it exactly?
Calculating it exactly involves doing some fiddly geometry to determine the area between two lines. The fiddliness is due to the fact that the lines can cross any of the sides of the square (it would be easier if the dataset was a circle, or if we knew that the crossing point was near the center of the dataset (when the angle between them would be enough)). I had a look at calculating it in an earlier homework, but decided it wasn't worth the bother.

More straightforward is to make the sample big enough. 1000 is a long way short of what you need, because all except 10-20 of those points are accurately classified by both algorithms.

The uncertainty in estimates is quite apparent. Suppose you have a method and want to estimate its accuracy. In a number of runs you find an average of 10 of 1000 random points are misclassified. Each point is a perfectly random sample from a distribution which has about 1% of one value and 99% of the other. In a single run, there is huge uncertainty on this estimate: getting 5 or 15 misclassified points is going to happen. Because this is happening with the misclassified points for each of the two methods, the uncertainty in the difference between them is even larger.

The consequence is that the advantage of the better method appears a lot less when the sample is small, because of this noise in the estimates dominates a rather delicate signal.

Hence I used 100,000 random points, so that the number of misclassified points for each method was a lot more stable. Empirically, this gave quite repeatable results. The uncertainty in the misclassification error of each of the two algorithms can be estimated separately by doing a moderate number of repeat runs (eg with 10000 each) and looking at the range of values found. You can then even combine the runs together and infer a good estimate of the uncertainty on the combined run (based on the variance of the estimate being inversely proportional to the number of samples).

[could you give a link about the documentation you mentioned? I can't find a reference to "sweep" in the documentation I used at http://cran.r-project.org/web/packag...ab/kernlab.pdf and I don't quite see what this is doing from the R documentation of this function.]
Reply With Quote
  #16  
Old 05-23-2013, 09:23 AM
jlaurentum jlaurentum is offline
Member
 
Join Date: Apr 2013
Location: Venezuela
Posts: 41
Default Re: Q9, SVM vs PLA

Hello Christine:

I tried your code using the sweep function (which is totally mysterious to me and so like Elroch, I'd like to ask how you arrived at this function). I got the following error message (using r in spanish):

Code:
Error en sweep(x[, 2:3], MARGIN = 1, y, "*") : 
  subíndice fuera de  los límites
Ejecución interrumpida
In other words, subindex out of bounds.
So I tried my version of the H matrix:

Code:
H <-  kernelPol(vanilladot(),x,,y)
where x and y are the input matrix and output vector respectively. This is what I got:

Code:
Error en solve.default(AP, c(c.x, c.y)) : 
  sistema es computacionalmente singular: número de condición recíproco = 1.92544e-16
Calls: ipop -> ipop -> solve -> solve.default
Ejecución interrumpida
Hmmm... So a computationally singular matrix. I played around with the u vector of upper constraints like so:

Code:
u <- matrix(rep(1e3,N))
And it somehow went through (sometimes). However, apart from the fact that the results were complete nonsense the in sample error was invariably non zero. As Elroch remarked, I also had the intuitive realization that if the upper matrix was not high enough, some alpha values were always reaching the upper bound (in this case 1e3=1000), and that did not seem right....

Ahh... quadratic programming and its mysteries!

That's why I gave up on ipop altogether and decided to use ksvm:

Code:
	x <- as.matrix(training_data[,2:3]) #pull out x_0
	y <- as.matrix(training_data[,4])
	svmmodel <- ksvm(x,y,kernel="vanilladot",C=100,type="C-svc")
But my answers obtained in Q9 and Q10 were not correct.
Reply With Quote
  #17  
Old 05-24-2013, 02:16 AM
catherine catherine is offline
Member
 
Join Date: Apr 2013
Posts: 18
Default Re: Q9, SVM vs PLA

Hi guys,

Sorry for the confusion:

1. The X matrix is my code excerpt above includes x0 (I used the same matrix for PLA), so leave out the index sub-setting if you are using a separate matrix for SVM.

2. sweep(XIn[,2:3], MARGIN=1, yIn, '*') is the same as apply(XIn[,2:3], 2, function(x) {x * yIn} )

3. Here is the kernlab documentation I used: http://cran.r-project.org/web/packag...ab/kernlab.pdf
Reply With Quote
  #18  
Old 05-24-2013, 09:26 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: Q9, SVM vs PLA

Thanks Catherine. Does that make sense about nature of the errors due to sample size?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 07:56 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.