Quote:
Originally Posted by Anne Paulson
If we actually randomly select the next point from the uniform distribution of misclassified points, we're going to have to do a lot of annoying bookkeeping, because every single time we update, we'll have to recheck every single point, figure out which ones are now misclassified (remembering that a previously correctly classified point can now be misclassified after update), and randomly pick one of the bad points for the next update.

You can do a (virtual) random permutation of the points, and then selecting a misclassified point uniformly at random is equivalent to selecting the "next" misclassified point (when not using any random permutation).
For PLA convergence, the random selection doesn't matter much, and the observed differences can be datadependent. You may want to be aware that the PLA is a very special algorithm, though. In most of the learning algorithms that you will see in the future, the "selection" shall better be of a random or some more clever strategy  an arbitrary choice may not always work well.
Hope this helps.