LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 1 (http://book.caltech.edu/bookforum/forumdisplay.php?f=130)
-   -   PLA - Need Guidance (http://book.caltech.edu/bookforum/showthread.php?t=838)

samirbajaj 07-11-2012 03:48 PM

PLA - Need Guidance
 
Greetings!

I am working on the Perceptron part of the homework, and having spent several hours on it, I'd like to know if I am proceeding in the right direction:

1) My implementation converges in 'N' iterations. This looks rather fishy. Any comments would be appreciated. (Otherwise I may have to start over :-( maybe in a different programming language)

2) I don't understand the Pr( f(x) != g(x) ) expression -- what exactly does this mean? Once the algorithm has converged, presumable f(x) matches g(x) on all data, so the difference is zero.


Thanks.

-Samir

yaser 07-11-2012 05:13 PM

Re: PLA - Need Guidance
 
Quote:

Originally Posted by samirbajaj (Post 3371)
I don't understand the Pr( f(x) != g(x) ) expression -- what exactly does this mean? Once the algorithm has converged, presumable f(x) matches g(x) on all data, so the difference is zero

On all data, yes. However, the probability is with respect to {\bf x} over the entire input space, not restricted to {\bf x} being in the finite data set used for training.

jakvas 07-12-2012 07:30 AM

Re: PLA - Need Guidance
 
If we try to evaluate Pr(f(x)!=g(x)) experimentaly how many random verification points should we use to get a significant answear?

I am tempted to believe that Hoeffding's inequality is applicable in this case to a single experiment but since we are averaging out over very many experiments I'm not sure on how to choose the amount of those verification data points (I ultimately worked with 10000 per experiment just to be sure).

yaser 07-12-2012 09:56 AM

Re: PLA - Need Guidance
 
Quote:

Originally Posted by jakvas (Post 3374)
I am tempted to believe that Hoeffding's inequality is applicable in this case to a single experiment but since we are averaging out over very many experiments I'm not sure on how to choose the amount of those verification data points (I ultimately worked with 10000 per experiment just to be sure).

Indeed, the average helps smooth out statistical fuctuations. Your choice of 10000 points is pretty safe.

jtwang 07-16-2012 09:19 PM

Re: PLA - Need Guidance
 
How would you determine f(x) == g(x) exactly - since the set of possible hypotheses is infinite (3 reals), wouldn't Pr(f(x) != g(x)) == 1? Obviously you could choose some arbitrary epsilon but then that wouldn't be "exactly."

yaser 07-16-2012 09:39 PM

Re: PLA - Need Guidance
 
Quote:

Originally Posted by jtwang (Post 3450)
How would you determine f(x) == g(x) exactly - since the set of possible hypotheses is infinite (3 reals), wouldn't Pr(f(x) != g(x)) == 1? Obviously you could choose some arbitrary epsilon but then that wouldn't be "exactly."

f({\bf x})=g({\bf x}) is per point {\bf x}. It may be true for some {\bf x}'s and false for others, hence the notion of probability that it's true (probability with respect to {\bf x}). We are not saying that f is identically equal to g.

dobrokot 01-08-2013 02:15 PM

Re: PLA - Need Guidance
 
Quote:

Originally Posted by jakvas (Post 3374)
I'm not sure on how to choose the amount of those verification data points (I ultimately worked with 10000 per experiment just to be sure).

Hoeffding inequality given in same lesson can help to choose number of points. g(x)!=f(x) can be thinked as red marble

nroger 01-09-2013 07:18 AM

Re: PLA - Need Guidance
 
I still don't understand this Pr() function. Given two (linear) functions f and g, what is the Pr() of f and g?
Thanks...Neil

yaser 01-09-2013 08:12 AM

Re: PLA - Need Guidance
 
Quote:

Originally Posted by nroger (Post 8469)
I still don't understand this Pr() function. Given two (linear) functions f and g, what is the Pr() of f and g?
Thanks...Neil

This is the probability of an event, the event in the case discussed in this thread being that f({\bf x})\ne g({\bf x}), which means you pick {\bf x} at random according to the probability distribution over the input space {\cal X} and evaluate "the fraction of time" that f does not give the same value as g for the {\bf x} you pick.

BTW, anyone who wants to refresh some of the prerequisite material for the course, here are some recommendations:

http://book.caltech.edu/bookforum/showthread.php?t=3720

sricharan92 01-12-2013 02:21 PM

Re: PLA - Need Guidance
 
Sir

If I understand correctly, we are using N = 10 training points out of a randomly generated x points according to the target function f for perceptron learning and Pr(f(x) != g(x)) should be calculated considering all x points and not just the training data. Am I right ?

yaser 01-12-2013 06:26 PM

Re: PLA - Need Guidance
 
Quote:

Originally Posted by sricharan92 (Post 8602)
Pr(f(x) != g(x)) should be calculated considering all x points and not just the training data. Am I right?

You are right. Considering all {\bf x} points can be done analytically or approximated by Monte Carlo using a new, large set of points generated at random.

butterscotch 01-12-2013 06:28 PM

Re: PLA - Need Guidance
 
For Pr(f(x) != g(x)), you would want to generate a lot of random points. However, I would not include the training data for this calculation. f & g would always agree on this training data, and wouldn't represent Pr(f(x) != g(x)) fairly.

vbipin 01-14-2013 07:18 PM

Re: PLA - Need Guidance
 
Quote:

Originally Posted by yaser (Post 3375)
Indeed, the average helps smooth out statistical fuctuations. Your choice of 10000 points is pretty safe.

Dear Professor,

Can you kindly explain how we can calculate this number. How can we ensure that the number is "sufficiently large"

Thanks,
Bipin

legolas_sid 01-14-2013 07:38 PM

Re: PLA - Need Guidance
 
Hi all,
it would be great if any of you could tell me if my understanding of this algorithm is correct.

1)pick a target function(f)(i.e a line in the [-1,1] * [-1,1] plane)

2)generate a set of training examples of the form ((x_d,y_d),output)

(I am confused from this point onwards,please correct me if I am wrong:) )

3)we initially set H as a vector of 0.0s

4)apply the perceptron iteration till the h function 'linearly seperates' all the data points.(we call this g).

5)test this g on a set of new data points (test_data,f(test_data)).

6)compare this with h and calculate error probability.

yaser 01-14-2013 07:40 PM

Re: PLA - Need Guidance
 
Quote:

Originally Posted by vbipin (Post 8663)
Dear Professor,

Can you kindly explain how we can calculate this number. How can we ensure that the number is "sufficiently large"

Thanks,
Bipin

The probability for a single random point to be misclassified is, say, \mu. Therefore the variance for one point (1 if misclassified, 0 if classified correctly) is \mu(1-\mu) which is at most 0.25 independently of \mu. If you average the misclassification value of 10000 points, the expected value will be \mu (which is what you want) and the variance will be at most 0.25/10000 (because of independence). The standard deviation which is the square root of this variance gives you an indication of the "error bar" around the expected value that you are likely to get in your estimate. In the multiple-choice setup, we want the error bar to be small enough to make it highly unlikely that your estimate will take you away from the correct answer to the nearest incorrect answer. This is why 10000 is "sufficiently large" in this case.

yaser 01-14-2013 07:48 PM

Re: PLA - Need Guidance
 
Quote:

Originally Posted by legolas_sid (Post 8665)
Hi all,
it would be great if any of you could tell me if my understanding of this algorithm is correct.

1)pick a target function(f)(i.e a line in the [-1,1] * [-1,1] plane)

2)generate a set of training examples of the form ((x_d,y_d),output)

(I am confused from this point onwards,please correct me if I am wrong:) )

3)we initially set H as a vector of 0.0s

4)apply the perceptron iteration till the h function 'linearly seperates' all the data points.(we call this g).

5)test this g on a set of new data points (test_data,f(test_data)).

6)compare this with h and calculate error probability.

You are correct, with step 3 put more accurately as "we start with h\in{\cal H} that corresponds to a weight vector of 0's." In step 1, the problem specifies a particular way of picking the random line that will define the target function.

legolas_sid 01-14-2013 07:54 PM

Re: PLA - Need Guidance
 
thank you professor:)so in reality, step 1 represents the given problem.

iamtrk 01-15-2013 06:17 PM

Re: PLA - Need Guidance
 
Hi Prof. Yaser,

I need help in implementation of PLA in JAVA . How i did it is generate two random points , and calculate the line in ax+by = 1 format.

After this i generate 100 random points and calculate yn of each point and put them in a 100x3 matrix.

I start with a weight vector of size 2 ( 1 for calculating a , other for b) with initial values of zero.

Now i loop through the random point matrix and apply the PLA formula for the misclassified points .

The program stops when there are no misclassified points.

The problem i am facing is for 100 points its not converging quite frequently . And for 10 points it is converging in 2-3 iterations.

So this is my problem professor , can you olease help me in catching what i am missing.

Thanks,
Trk.

butterscotch 01-15-2013 06:56 PM

Re: PLA - Need Guidance
 
The weight vector should include a bias term. In this case dimension of input is 2, so weight vector will be of size 3. Right now you have ax + by = 1, while "1" never gets updated. I suspect when there are small number of pts, the algorithm could converge without the bias term, but with more pts (requiring higher accuracy for no pts to be misclassifed), it doesn't.

gah44 01-15-2013 07:20 PM

Re: PLA - Need Guidance
 
Quote:

Originally Posted by jtwang (Post 3450)
How would you determine f(x) == g(x) exactly - since the set of possible hypotheses is infinite (3 reals), wouldn't Pr(f(x) != g(x)) == 1? Obviously you could choose some arbitrary epsilon but then that wouldn't be "exactly."

There are two lines, the original line that determines the separation between +1 and -1, and the line determined by the PLA. The questions ask what fraction of the space is different between the two lines. If they don't cross, that is the area between the two lines (divided by four, the total area). If they do cross, it is the area of two triangles (or, sometimes, quadrilaterals).

Each line can crosses two of the sides of the square. (I suppose it could also go right through a corner, but not likely). Handling all the possible combinations of the two lines is a lot of work.

In another thread I discussed how I did it, only counting, and computing the area of, cases where both lines go through the top and bottom. That is about 30% in my tests. By symmetry, there should also be 30% where both go through the left and right sides of the square The remaining cases might have a little less area, but I based my answer on just the lines going through the top and bottom of the square. Seemed more interesting than the choose random point method.

iamtrk 01-16-2013 04:00 AM

Re: PLA - Need Guidance
 
Hi butterscotch,

Thanks for the tip now its converging consistently, but it is taking much lesser iterations , for 10 training points avg iterations are 1.5 and for 100 touchpoints it
is around 10. These values are wrong according to the answer sheet.Can you help me with this. now i am using ax+by = c line equation. The bias term i am initiating it to
the constant term of the line that i created initially from two random points. For
misclassified points bias term is w = w+/- 1. Is this correct or still am i missing something.

Thanks,
Trk

alternate 01-16-2013 04:40 AM

Re: PLA - Need Guidance
 
Does your program print out the number of iterations and probability of error after every individual trial?

If not, you might want to add that so that you can watch it calculate and compare those numbers to the mean you get at the end. It will help narrow down whether the problem is in your algorithm or a simple math error.

I.e., if you're getting 1.5 for the average iterations per trial, then most of your individual trials should have numbers like 1 or 2 iterations.

iamtrk 01-16-2013 08:14 PM

Re: PLA - Need Guidance
 
Hi all,

I did a little mistake in check for convergence , now i corrected it still its taking far lesser iterations to converge . I am checking how the final values seperate the trainig
Points, it is matching perfectly . WHAT I THINK IS THAT I AM INITIATING W0 TO THE
CONSTANT TERM OF THE ORIGINAL LINE so the algo is taking lesser iterations . But If i initiate it to 0. It is not converging.

MY CONCERN IS IF W0 IS INITIATED TO 0. THEN IT IS NOT CONVERGING.
SHOULD I INITIATE W0 WITH SOME RANDOM NUMBER . Fellow students please
Help me here . I am stuck on this.

alternate 01-17-2013 02:57 AM

Re: PLA - Need Guidance
 
w0 is definitely supposed to be initialized to zero, so there's a logic error in your code somewhere. If whether or not you converge depends on the initial value, you're probably not actually updating w0. w0 should update every iteration.

It would be helpful if you can run a 1-trial test and manually examine the results of every iteration (see what the new weights are, how many points are currently misclassified.) This is really easy to do in the language I'm using but I'm not familiar with Java. You could probably print out the values or graph them visually.

Either way, changing the trials from 1000 to 1 and stepping through the algorithm may show you an obvious source of error. Then once one trial gives you the results you're supposed to get, doing multiple trials should work out cleanly. If you still can't figure out what's going wrong, then you could post here the output of the program for one trial.

mvellon 04-06-2013 03:54 PM

Re: PLA - Need Guidance
 
I think I've got the PLA working correctly, but I'm not sure that I can see how w0 ends up being anything other than 1, 0 or -1. Is this correct?

if w = w + xn*yn, but x0 always is 1 and yn is 1 or -1, then w0 is only ever going to change by 1 or -1. No?

(Correction: w0 only ever changes by 1 or -1)

yaser 04-06-2013 06:31 PM

Re: PLA - Need Guidance
 
Quote:

Originally Posted by mvellon (Post 10179)
I think I've got the PLA working correctly, but I'm not sure that I can see how w0 ends up being anything other than 1, 0 or -1. Is this correct?

if w = w + xn*yn, but x0 always is 1 and yn is 1 or -1, then w0 is only ever going to change by 1 or -1. No?

(Correction: w0 only ever changes by 1 or -1)

You are correct, and the result is that w_0 will always be an integer under this scheme. However, if the entire vector {\bf w} is scaled up or down, the separating line it represents does not change, so the {\bf w} you get is equivalent to other {\bf w}'s in which w_0 is not an integer.


All times are GMT -7. The time now is 03:21 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.