 LFD Book Forum PLA - Need Guidance #11 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477 Re: PLA - Need Guidance

Quote:
 Originally Posted by sricharan92 Pr(f(x) != g(x)) should be calculated considering all x points and not just the training data. Am I right?
You are right. Considering all points can be done analytically or approximated by Monte Carlo using a new, large set of points generated at random.
__________________
Where everyone thinks alike, no one thinks very much
#12
 butterscotch Caltech Join Date: Jan 2013 Posts: 43 Re: PLA - Need Guidance

For Pr(f(x) != g(x)), you would want to generate a lot of random points. However, I would not include the training data for this calculation. f & g would always agree on this training data, and wouldn't represent Pr(f(x) != g(x)) fairly.
#13
 vbipin Member Join Date: Jan 2013 Location: Shanghai Posts: 18 Re: PLA - Need Guidance

Quote:
 Originally Posted by yaser Indeed, the average helps smooth out statistical fuctuations. Your choice of 10000 points is pretty safe.
Dear Professor,

Can you kindly explain how we can calculate this number. How can we ensure that the number is "sufficiently large"

Thanks,
Bipin
#14
 legolas_sid Junior Member Join Date: Jan 2013 Posts: 6 Re: PLA - Need Guidance

Hi all,
it would be great if any of you could tell me if my understanding of this algorithm is correct.

1)pick a target function(f)(i.e a line in the [-1,1] * [-1,1] plane)

2)generate a set of training examples of the form ((x_d,y_d),output)

(I am confused from this point onwards,please correct me if I am wrong )

3)we initially set H as a vector of 0.0s

4)apply the perceptron iteration till the h function 'linearly seperates' all the data points.(we call this g).

5)test this g on a set of new data points (test_data,f(test_data)).

6)compare this with h and calculate error probability.
#15 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477 Re: PLA - Need Guidance

Quote:
 Originally Posted by vbipin Dear Professor, Can you kindly explain how we can calculate this number. How can we ensure that the number is "sufficiently large" Thanks, Bipin
The probability for a single random point to be misclassified is, say, . Therefore the variance for one point (1 if misclassified, 0 if classified correctly) is which is at most 0.25 independently of . If you average the misclassification value of 10000 points, the expected value will be (which is what you want) and the variance will be at most 0.25/10000 (because of independence). The standard deviation which is the square root of this variance gives you an indication of the "error bar" around the expected value that you are likely to get in your estimate. In the multiple-choice setup, we want the error bar to be small enough to make it highly unlikely that your estimate will take you away from the correct answer to the nearest incorrect answer. This is why 10000 is "sufficiently large" in this case.
__________________
Where everyone thinks alike, no one thinks very much
#16 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477 Re: PLA - Need Guidance

Quote:
 Originally Posted by legolas_sid Hi all, it would be great if any of you could tell me if my understanding of this algorithm is correct. 1)pick a target function(f)(i.e a line in the [-1,1] * [-1,1] plane) 2)generate a set of training examples of the form ((x_d,y_d),output) (I am confused from this point onwards,please correct me if I am wrong ) 3)we initially set H as a vector of 0.0s 4)apply the perceptron iteration till the h function 'linearly seperates' all the data points.(we call this g). 5)test this g on a set of new data points (test_data,f(test_data)). 6)compare this with h and calculate error probability.
You are correct, with step 3 put more accurately as "we start with that corresponds to a weight vector of 0's." In step 1, the problem specifies a particular way of picking the random line that will define the target function.
__________________
Where everyone thinks alike, no one thinks very much
#17
 legolas_sid Junior Member Join Date: Jan 2013 Posts: 6 Re: PLA - Need Guidance

thank you professor so in reality, step 1 represents the given problem.
#18
 iamtrk Junior Member Join Date: Jan 2013 Posts: 8 Re: PLA - Need Guidance

Hi Prof. Yaser,

I need help in implementation of PLA in JAVA . How i did it is generate two random points , and calculate the line in ax+by = 1 format.

After this i generate 100 random points and calculate yn of each point and put them in a 100x3 matrix.

I start with a weight vector of size 2 ( 1 for calculating a , other for b) with initial values of zero.

Now i loop through the random point matrix and apply the PLA formula for the misclassified points .

The program stops when there are no misclassified points.

The problem i am facing is for 100 points its not converging quite frequently . And for 10 points it is converging in 2-3 iterations.

So this is my problem professor , can you olease help me in catching what i am missing.

Thanks,
Trk.
#19
 butterscotch Caltech Join Date: Jan 2013 Posts: 43 Re: PLA - Need Guidance

The weight vector should include a bias term. In this case dimension of input is 2, so weight vector will be of size 3. Right now you have ax + by = 1, while "1" never gets updated. I suspect when there are small number of pts, the algorithm could converge without the bias term, but with more pts (requiring higher accuracy for no pts to be misclassifed), it doesn't.
#20
 gah44 Invited Guest Join Date: Jul 2012 Location: Seattle, WA Posts: 153 Re: PLA - Need Guidance

Quote:
 Originally Posted by jtwang How would you determine f(x) == g(x) exactly - since the set of possible hypotheses is infinite (3 reals), wouldn't Pr(f(x) != g(x)) == 1? Obviously you could choose some arbitrary epsilon but then that wouldn't be "exactly."
There are two lines, the original line that determines the separation between +1 and -1, and the line determined by the PLA. The questions ask what fraction of the space is different between the two lines. If they don't cross, that is the area between the two lines (divided by four, the total area). If they do cross, it is the area of two triangles (or, sometimes, quadrilaterals).

Each line can crosses two of the sides of the square. (I suppose it could also go right through a corner, but not likely). Handling all the possible combinations of the two lines is a lot of work.

In another thread I discussed how I did it, only counting, and computing the area of, cases where both lines go through the top and bottom. That is about 30% in my tests. By symmetry, there should also be 30% where both go through the left and right sides of the square The remaining cases might have a little less area, but I based my answer on just the lines going through the top and bottom of the square. Seemed more interesting than the choose random point method. Tags convergence, iterations, perceptron, pla Thread Tools Show Printable Version Email this Page Display Modes Linear Mode Switch to Hybrid Mode Switch to Threaded Mode Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 04:13 AM. The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.