The proof for the PLA convergence is given as a guided problem 1.3 (Enjoy!)
The proof applies no matter which misclassified point you pick at each iteration.
In practice I have not observed significant differences between first misclassified and random misclassified.
As pointed out picking a misclassified point requires some overhead. In CSspeak, it takes O(N) time. To pick the first one (first according to index), make a pass through the data picking the first point that is misclassified. For a random one, do the same but passing through the data in a random order.
Quote:
Originally Posted by Yann222
Thanks, I'll have to check out the proof for myself, but trust you're right. I assume the book contains a proof, just haven't gotten my hands on copy yet (in transit).
As for performance between selecting 1st mismatch vs. random mismatch, I doublechecked by running my code both ways a few times and random choice took considerably fewer iterations. Hopefully the other poster mixed it up when typing...
