PLA: Is randomly selecting the update important?
The official description of the perceptron learning algorithm says that we randomly select an incorrectly classified point, and then update to nudge that point closer to being correctly classified.
If we actually randomly select the next point from the uniform distribution of misclassified points, we're going to have to do a lot of annoying bookkeeping, because every single time we update, we'll have to recheck every single point, figure out which ones are now misclassified (remembering that a previously correctly classified point can now be misclassified after update), and randomly pick one of the bad points for the next update. Now, obviously, if the PLA is going to converge, it will converge no matter how we pick the next misclassified point, as long as we eventually get to every misclassified point. How much difference does it make if we select our update points nonrandomly? I'll do the random update for the homework, just in case it makes a difference, but I suspect it doesn't on average. 
Re: PLA: Is randomly selecting the update important?
I had originally chosen the first mismatched point instead of a randomly selected one. For N=10 there wasn't much difference, but for N=100 the random choice performed better (to the point where it could affect the selected answer).
Also, though I wasn't able to make it happen, it seems the algorithm may not converge if next point is not selected randomly (could get stuck in some sort of infinite loop). 
Re: PLA: Is randomly selecting the update important?
No, the algorithm will always converge eventually if there is linear separation, no matter how you pick the misclassified point. And the proof isn't that difficult.
But now we have one person in one thread saying that they found picking the points at random gave better results, and another person in another thread saying picking the first misclassified point gave better results. Mysterious. My strategy would be to start with the last misclassified point, the one I just updated with; try it again if it is still misclassified, and then pick the next misclassified point in the list. 
Re: PLA: Is randomly selecting the update important?
Thanks, I'll have to check out the proof for myself, but trust you're right. I assume the book contains a proof, just haven't gotten my hands on copy yet (in transit).
As for performance between selecting 1st mismatch vs. random mismatch, I doublechecked by running my code both ways a few times and random choice took considerably fewer iterations. Hopefully the other poster mixed it up when typing... 
Re: PLA: Is randomly selecting the update important?
I don't recall if the book contains a proof of PLA convergence or not. If not, there's an outline of one in the slides for lecture 2 of Geoffrey Hinton's neural network course at Coursera. Here's a link to those slides: lecture 2. The relevant slides are in parts 2c and 2d.

Re: PLA: Is randomly selecting the update important?
The proof for the PLA convergence is given as a guided problem 1.3 (Enjoy!)
The proof applies no matter which misclassified point you pick at each iteration. In practice I have not observed significant differences between first misclassified and random misclassified. As pointed out picking a misclassified point requires some overhead. In CSspeak, it takes O(N) time. To pick the first one (first according to index), make a pass through the data picking the first point that is misclassified. For a random one, do the same but passing through the data in a random order. Quote:

Re: PLA: Is randomly selecting the update important?
Quote:
For PLA convergence, the random selection doesn't matter much, and the observed differences can be datadependent. You may want to be aware that the PLA is a very special algorithm, though. In most of the learning algorithms that you will see in the future, the "selection" shall better be of a random or some more clever strategy  an arbitrary choice may not always work well. Hope this helps. 
Re: PLA: Is randomly selecting the update important?
You can do a (virtual) random permutation of the points,
That's got to be O(N) at least, though. 
Re: PLA: Is randomly selecting the update important?
Quote:

Re: PLA: Is randomly selecting the update important?
I think the algorithm "moves in the right direction" on the last point, though I think the sign of the dot product could still be incorrect after an iteration (this is what Exc.1.3 in the book walks you through). However, there's no guarantee that improvement will ever get you there (could be monotone sequence of increasingly small steps, though I think it's easy enough to convince oneself this won't happen) or that this point will not get "messed up" later in another iteration when you are taking care of a different misclassified point.
Problem (as opposed to "exercise") 1.3 of chapter one, as magdon pointed out, steps you through a full proof of PLA convergence. I haven't attempted it yet, but seems pretty involved. 
All times are GMT 7. The time now is 05:49 AM. 
Powered by vBulletin® Version 3.8.3
Copyright ©2000  2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. AbuMostafa, Malik MagdonIsmail, and HsuanTien Lin, and participants in the Learning From Data MOOC by Yaser S. AbuMostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.