LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 1 (http://book.caltech.edu/bookforum/forumdisplay.php?f=130)

 clarkfitzg 04-07-2013 01:58 PM

Using the PLA on 100 points and counting the number of iterations required for convergence I observed some interesting behavior. I set the max number of iterations to 1000 and then ran 10,000 simulations. This histogram shows the results:
http://s23.postimg.org/57j40l46j/iterations_page_1.jpg
The data at 1000 can be interpreted as those simulations which would have taken over 1000 iterations to converge. When I removed the limit on the max number of iterations and ran the experiment I had one case take iterations to converge. From this I conclude that the perceptron doesn't always converge rapidly.

In my experiment the mean number of iterations required for convergence was around 240, which is not particularly close to the possible answers of 100 and 500. The median, however, is certainly near 100. Do other people have similar results?

 kafar 04-08-2013 07:06 AM

interestingly, i tried a 10k runs of N=100. the avg number of iterations to converge is close to 500.

In my lazy implementation, I always pick the first misclassified point to update the weight vector. Looking at the theory, I don't see how that matters. But since I got this question wrong, I guess a randomly chosen misclassified point would works better.

 Michael Reach 04-08-2013 09:15 AM

My mean was considerably smaller, but there is a very long tail. If you graph the lines at various stages, you can actually see the h hypothesis line jiggling back and forth as it gets closer to the "g" line. The size of the jiggles damps down as you go along. It needs to get close enough so that none of your random chosen set of points is on the wrong side.

If you picture shifting the g line around as far as you can till it bumps into one of the points, you get the range of potential solution "h" lines. You'll see that it depends on the initial points how much room you have for that; often you'll have lots of room. But one could imagine three points that are almost exactly colinear and the g line goes just between them - and there's almost no room to jiggle! In a case like that, it can take an awfully long time till the jiggling of the the h line settles down far enough so that it doesn't keep jumping across that very narrow range.

 kafar 04-08-2013 12:22 PM

Quote:
 Originally Posted by Michael Reach (Post 10228) My mean was considerably smaller, but there is a very long tail. If you graph the lines at various stages, you can actually see the h hypothesis line jiggling back and forth as it gets closer to the "g" line. The size of the jiggles damps down as you go along. It needs to get close enough so that none of your random chosen set of points is on the wrong side. If you picture shifting the g line around as far as you can till it bumps into one of the points, you get the range of potential solution "h" lines. You'll see that it depends on the initial points how much room you have for that; often you'll have lots of room. But one could imagine three points that are almost exactly colinear and the g line goes just between them - and there's almost no room to jiggle! In a case like that, it can take an awfully long time till the jiggling of the the h line settles down far enough so that it doesn't keep jumping across that very narrow range.
very well said. i think it explains why PLA should randomly pick misclassified points for faster convergence. Perhaps at certain point, the algorithm should pick the point which is closest to f so f won't jump back and forth too much.

 IsidroHidalgo 04-08-2013 04:02 PM

I think this is a interesting question, do you mean really matters the misclassified point choosed at each iteration? It's very easy calculate distance of each point to select the closest...

 Elroch 04-08-2013 04:52 PM

Quote:
 Originally Posted by clarkfitzg (Post 10207) Using the PLA on 100 points and counting the number of iterations required for convergence I observed some interesting behavior. I set the max number of iterations to 1000 and then ran 10,000 simulations. This histogram shows the results: http://s23.postimg.org/57j40l46j/iterations_page_1.jpg The data at 1000 can be interpreted as those simulations which would have taken over 1000 iterations to converge. When I removed the limit on the max number of iterations and ran the experiment I had one case take iterations to converge. From this I conclude that the perceptron doesn't always converge rapidly. In my experiment the mean number of iterations required for convergence was around 240, which is not particularly close to the possible answers of 100 and 500. The median, however, is certainly near 100. Do other people have similar results?
I suspect you did not choose your points randomly for the iterative step. If you, for example, always search sequentially through the points for a misclassified point, I believe it is not so uncommon to get caught in a loop of misclassified points. This must eventually resolve itself, but might take a while, and this leads to more long runs and a lot higher average number of iterations. I found that randomly picking misclassified points decreases the average number of iterations a lot and shortens the long tail, presumably by limiting repeated bad choices of misclassified point.

 kafar 04-08-2013 08:26 PM

Quote:
 Originally Posted by Elroch (Post 10245) I suspect you did not choose your points randomly for the iterative step. If you, for example, always search sequentially through the points for a misclassified point, I believe it is not so uncommon to get caught in a loop of misclassified points. This must eventually resolve itself, but might take a while, and this leads to more long runs and a lot higher average number of iterations. I found that randomly picking misclassified points decreases the average number of iterations a lot and shortens the long tail, presumably by limiting repeated bad choices of misclassified point.
i see it in a different aspect. i won't call them bad choices of misclassified points but less informative. I've been thinking a bit. If the misclassified points are randomly picked, the algorithm has a better chance obtaining "knowledge" contained in the entire sample space, whereas obtaining knowledge by sequentially picking points is always much slower. So slower the convergence.

RE:IsidroHidalgo it was just a guess. i didn't run the experiment to verify. i don't think it would be good idea to look at distance from the start. but after the entire sample has been looked at, **maybe** closer points can be better choices. for the sake of this homework, randomly chosen misclassified points is the way to go.

 jlaurentum 04-09-2013 08:37 AM

Interesting observation about randomly chosing the misclassified point, Elroch.
If I'd have done more iterations, like some did above, I'd probably have answered question 9 incorrectly as well- it seems there are too many outliers for "iterations" when taking the first misclassified point.

How about taking the misclassified point that errs farther off in the inner product? that is, taking the misclassified point for which abs(w*x) is largest? What do you guys think?

 Elroch 04-09-2013 09:15 AM

Quote:
 Originally Posted by jlaurentum (Post 10275) Interesting observation about randomly chosing the misclassified point, Elroch. If I'd have done more iterations, like some did above, I'd probably have answered question 9 incorrectly as well- it seems there are too many outliers for "iterations" when taking the first misclassified point.
Lucky fellow!

Quote:
 How about taking the misclassified point that errs farther off in the inner product? that is, taking the misclassified point for which abs(w*x) is largest? What do you guys think?
I had the same thought, but have not implemented it. Has anyone?

 clarkfitzg 04-09-2013 07:24 PM