LFD Book Forum PLA: Is randomly selecting the update important?

#1
01-09-2013, 12:00 PM
 Anne Paulson Senior Member Join Date: Jan 2013 Location: Silicon Valley Posts: 52
PLA: Is randomly selecting the update important?

The official description of the perceptron learning algorithm says that we randomly select an incorrectly classified point, and then update to nudge that point closer to being correctly classified.

If we actually randomly select the next point from the uniform distribution of misclassified points, we're going to have to do a lot of annoying bookkeeping, because every single time we update, we'll have to recheck every single point, figure out which ones are now misclassified (remembering that a previously correctly classified point can now be misclassified after update), and randomly pick one of the bad points for the next update.

Now, obviously, if the PLA is going to converge, it will converge no matter how we pick the next misclassified point, as long as we eventually get to every misclassified point. How much difference does it make if we select our update points non-randomly? I'll do the random update for the homework, just in case it makes a difference, but I suspect it doesn't on average.
#2
01-09-2013, 12:12 PM
 Yann222 Member Join Date: Jan 2013 Location: SoCal Posts: 16
Re: PLA: Is randomly selecting the update important?

I had originally chosen the first mismatched point instead of a randomly selected one. For N=10 there wasn't much difference, but for N=100 the random choice performed better (to the point where it could affect the selected answer).

Also, though I wasn't able to make it happen, it seems the algorithm may not converge if next point is not selected randomly (could get stuck in some sort of infinite loop).
#3
01-09-2013, 12:24 PM
 Anne Paulson Senior Member Join Date: Jan 2013 Location: Silicon Valley Posts: 52
Re: PLA: Is randomly selecting the update important?

No, the algorithm will always converge eventually if there is linear separation, no matter how you pick the misclassified point. And the proof isn't that difficult.

But now we have one person in one thread saying that they found picking the points at random gave better results, and another person in another thread saying picking the first misclassified point gave better results. Mysterious.

My strategy would be to start with the last misclassified point, the one I just updated with; try it again if it is still misclassified, and then pick the next misclassified point in the list.
#4
01-09-2013, 12:52 PM
 Yann222 Member Join Date: Jan 2013 Location: SoCal Posts: 16
Re: PLA: Is randomly selecting the update important?

Thanks, I'll have to check out the proof for myself, but trust you're right. I assume the book contains a proof, just haven't gotten my hands on copy yet (in transit).

As for performance between selecting 1st mismatch vs. random mismatch, I double-checked by running my code both ways a few times and random choice took considerably fewer iterations. Hopefully the other poster mixed it up when typing...
#5
01-09-2013, 01:19 PM
 tzs29970 Invited Guest Join Date: Apr 2012 Posts: 52
Re: PLA: Is randomly selecting the update important?

I don't recall if the book contains a proof of PLA convergence or not. If not, there's an outline of one in the slides for lecture 2 of Geoffrey Hinton's neural network course at Coursera. Here's a link to those slides: lecture 2. The relevant slides are in parts 2c and 2d.
#6
01-09-2013, 01:55 PM
 magdon RPI Join Date: Aug 2009 Location: Troy, NY, USA. Posts: 595
Re: PLA: Is randomly selecting the update important?

The proof for the PLA convergence is given as a guided problem 1.3 (Enjoy!)

The proof applies no matter which misclassified point you pick at each iteration.

In practice I have not observed significant differences between first misclassified and random misclassified.

As pointed out picking a misclassified point requires some overhead. In CS-speak, it takes O(N) time. To pick the first one (first according to index), make a pass through the data picking the first point that is misclassified. For a random one, do the same but passing through the data in a random order.

Quote:
 Originally Posted by Yann222 Thanks, I'll have to check out the proof for myself, but trust you're right. I assume the book contains a proof, just haven't gotten my hands on copy yet (in transit). As for performance between selecting 1st mismatch vs. random mismatch, I double-checked by running my code both ways a few times and random choice took considerably fewer iterations. Hopefully the other poster mixed it up when typing...
__________________
Have faith in probability
#7
01-09-2013, 02:31 PM
 htlin NTU Join Date: Aug 2009 Location: Taipei, Taiwan Posts: 601
Re: PLA: Is randomly selecting the update important?

Quote:
 Originally Posted by Anne Paulson If we actually randomly select the next point from the uniform distribution of misclassified points, we're going to have to do a lot of annoying bookkeeping, because every single time we update, we'll have to recheck every single point, figure out which ones are now misclassified (remembering that a previously correctly classified point can now be misclassified after update), and randomly pick one of the bad points for the next update.
You can do a (virtual) random permutation of the points, and then selecting a misclassified point uniformly at random is equivalent to selecting the "next" misclassified point (when not using any random permutation).

For PLA convergence, the random selection doesn't matter much, and the observed differences can be data-dependent. You may want to be aware that the PLA is a very special algorithm, though. In most of the learning algorithms that you will see in the future, the "selection" shall better be of a random or some more clever strategy --- an arbitrary choice may not always work well.

Hope this helps.
__________________
When one teaches, two learn.
#8
01-09-2013, 05:18 PM
 Anne Paulson Senior Member Join Date: Jan 2013 Location: Silicon Valley Posts: 52
Re: PLA: Is randomly selecting the update important?

You can do a (virtual) random permutation of the points,

That's got to be O(N) at least, though.
#9
01-10-2013, 05:47 AM
 tathagata Junior Member Join Date: Jan 2013 Posts: 9
Re: PLA: Is randomly selecting the update important?

Quote:
 Originally Posted by Anne Paulson No, the algorithm will always converge eventually if there is linear separation, no matter how you pick the misclassified point. And the proof isn't that difficult. But now we have one person in one thread saying that they found picking the points at random gave better results, and another person in another thread saying picking the first misclassified point gave better results. Mysterious. My strategy would be to start with the last misclassified point, the one I just updated with; try it again if it is still misclassified, and then pick the next misclassified point in the list.
If you are referring to my post then I am sorry, it was a typo, randomization is faster always. About your approach, I think it is necessary that the last point is always classified correctly due to the update rule, isn't it?
#10
01-10-2013, 08:35 AM
 Yann222 Member Join Date: Jan 2013 Location: SoCal Posts: 16
Re: PLA: Is randomly selecting the update important?

I think the algorithm "moves in the right direction" on the last point, though I think the sign of the dot product could still be incorrect after an iteration (this is what Exc.1.3 in the book walks you through). However, there's no guarantee that improvement will ever get you there (could be monotone sequence of increasingly small steps, though I think it's easy enough to convince oneself this won't happen) or that this point will not get "messed up" later in another iteration when you are taking care of a different misclassified point.

Problem (as opposed to "exercise") 1.3 of chapter one, as magdon pointed out, steps you through a full proof of PLA convergence. I haven't attempted it yet, but seems pretty involved.

 Tags pla

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 07:29 PM.