LFD Book Forum *ANSWER* Question 9
 User Name Remember Me? Password
 FAQ Calendar Mark Forums Read

 Thread Tools Display Modes
#1
04-07-2013, 12:58 PM
 clarkfitzg Junior Member Join Date: Apr 2013 Posts: 6
*ANSWER* Question 9

Using the PLA on 100 points and counting the number of iterations required for convergence I observed some interesting behavior. I set the max number of iterations to 1000 and then ran 10,000 simulations. This histogram shows the results:

The data at 1000 can be interpreted as those simulations which would have taken over 1000 iterations to converge. When I removed the limit on the max number of iterations and ran the experiment I had one case take iterations to converge. From this I conclude that the perceptron doesn't always converge rapidly.

In my experiment the mean number of iterations required for convergence was around 240, which is not particularly close to the possible answers of 100 and 500. The median, however, is certainly near 100. Do other people have similar results?
#2
04-08-2013, 06:06 AM
 kafar Junior Member Join Date: Apr 2013 Posts: 6
Re: *ANSWER* Question 9

interestingly, i tried a 10k runs of N=100. the avg number of iterations to converge is close to 500.

In my lazy implementation, I always pick the first misclassified point to update the weight vector. Looking at the theory, I don't see how that matters. But since I got this question wrong, I guess a randomly chosen misclassified point would works better.
#3
04-08-2013, 08:15 AM
 Michael Reach Senior Member Join Date: Apr 2013 Location: Baltimore, Maryland, USA Posts: 71
Re: *ANSWER* Question 9

My mean was considerably smaller, but there is a very long tail. If you graph the lines at various stages, you can actually see the h hypothesis line jiggling back and forth as it gets closer to the "g" line. The size of the jiggles damps down as you go along. It needs to get close enough so that none of your random chosen set of points is on the wrong side.

If you picture shifting the g line around as far as you can till it bumps into one of the points, you get the range of potential solution "h" lines. You'll see that it depends on the initial points how much room you have for that; often you'll have lots of room. But one could imagine three points that are almost exactly colinear and the g line goes just between them - and there's almost no room to jiggle! In a case like that, it can take an awfully long time till the jiggling of the the h line settles down far enough so that it doesn't keep jumping across that very narrow range.
#4
04-08-2013, 11:22 AM
 kafar Junior Member Join Date: Apr 2013 Posts: 6
Re: *ANSWER* Question 9

Quote:
 Originally Posted by Michael Reach My mean was considerably smaller, but there is a very long tail. If you graph the lines at various stages, you can actually see the h hypothesis line jiggling back and forth as it gets closer to the "g" line. The size of the jiggles damps down as you go along. It needs to get close enough so that none of your random chosen set of points is on the wrong side. If you picture shifting the g line around as far as you can till it bumps into one of the points, you get the range of potential solution "h" lines. You'll see that it depends on the initial points how much room you have for that; often you'll have lots of room. But one could imagine three points that are almost exactly colinear and the g line goes just between them - and there's almost no room to jiggle! In a case like that, it can take an awfully long time till the jiggling of the the h line settles down far enough so that it doesn't keep jumping across that very narrow range.
very well said. i think it explains why PLA should randomly pick misclassified points for faster convergence. Perhaps at certain point, the algorithm should pick the point which is closest to f so f won't jump back and forth too much.
#5
04-08-2013, 03:02 PM
 IsidroHidalgo Member Join Date: Apr 2013 Location: Toledo (Spain) Posts: 28
Re: *ANSWER* Question 9

I think this is a interesting question, do you mean really matters the misclassified point choosed at each iteration? It's very easy calculate distance of each point to select the closest...
#6
04-08-2013, 03:52 PM
 Elroch Invited Guest Join Date: Mar 2013 Posts: 143
Re: *ANSWER* Question 9

Quote:
 Originally Posted by clarkfitzg Using the PLA on 100 points and counting the number of iterations required for convergence I observed some interesting behavior. I set the max number of iterations to 1000 and then ran 10,000 simulations. This histogram shows the results: The data at 1000 can be interpreted as those simulations which would have taken over 1000 iterations to converge. When I removed the limit on the max number of iterations and ran the experiment I had one case take iterations to converge. From this I conclude that the perceptron doesn't always converge rapidly. In my experiment the mean number of iterations required for convergence was around 240, which is not particularly close to the possible answers of 100 and 500. The median, however, is certainly near 100. Do other people have similar results?
I suspect you did not choose your points randomly for the iterative step. If you, for example, always search sequentially through the points for a misclassified point, I believe it is not so uncommon to get caught in a loop of misclassified points. This must eventually resolve itself, but might take a while, and this leads to more long runs and a lot higher average number of iterations. I found that randomly picking misclassified points decreases the average number of iterations a lot and shortens the long tail, presumably by limiting repeated bad choices of misclassified point.
#7
04-08-2013, 07:26 PM
 kafar Junior Member Join Date: Apr 2013 Posts: 6
Re: *ANSWER* Question 9

Quote:
 Originally Posted by Elroch I suspect you did not choose your points randomly for the iterative step. If you, for example, always search sequentially through the points for a misclassified point, I believe it is not so uncommon to get caught in a loop of misclassified points. This must eventually resolve itself, but might take a while, and this leads to more long runs and a lot higher average number of iterations. I found that randomly picking misclassified points decreases the average number of iterations a lot and shortens the long tail, presumably by limiting repeated bad choices of misclassified point.
i see it in a different aspect. i won't call them bad choices of misclassified points but less informative. I've been thinking a bit. If the misclassified points are randomly picked, the algorithm has a better chance obtaining "knowledge" contained in the entire sample space, whereas obtaining knowledge by sequentially picking points is always much slower. So slower the convergence.

RE:IsidroHidalgo it was just a guess. i didn't run the experiment to verify. i don't think it would be good idea to look at distance from the start. but after the entire sample has been looked at, **maybe** closer points can be better choices. for the sake of this homework, randomly chosen misclassified points is the way to go.
#8
04-09-2013, 07:37 AM
 jlaurentum Member Join Date: Apr 2013 Location: Venezuela Posts: 41
Re: *ANSWER* Question 9

Interesting observation about randomly chosing the misclassified point, Elroch.
If I'd have done more iterations, like some did above, I'd probably have answered question 9 incorrectly as well- it seems there are too many outliers for "iterations" when taking the first misclassified point.

How about taking the misclassified point that errs farther off in the inner product? that is, taking the misclassified point for which abs(w*x) is largest? What do you guys think?
#9
04-09-2013, 08:15 AM
 Elroch Invited Guest Join Date: Mar 2013 Posts: 143
Re: *ANSWER* Question 9

Quote:
 Originally Posted by jlaurentum Interesting observation about randomly chosing the misclassified point, Elroch. If I'd have done more iterations, like some did above, I'd probably have answered question 9 incorrectly as well- it seems there are too many outliers for "iterations" when taking the first misclassified point.
Lucky fellow!

Quote:
 How about taking the misclassified point that errs farther off in the inner product? that is, taking the misclassified point for which abs(w*x) is largest? What do you guys think?
I had the same thought, but have not implemented it. Has anyone?
#10
04-09-2013, 06:24 PM
 clarkfitzg Junior Member Join Date: Apr 2013 Posts: 6
Re: *ANSWER* Question 9

Thanks for the insightful replies! Indeed I was choosing the first misclassified point every time. I updated my code to choose randomly and produced the following results:

The tail past 1000 is visibly smaller and the mean is much closer to the correct answer of 100 when the misclassified point is randomly chosen.

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 10:09 AM.

 Contact Us - LFD Book - Top

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.