LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 1 (http://book.caltech.edu/bookforum/forumdisplay.php?f=130)
-   -   PLA: Is randomly selecting the update important? (http://book.caltech.edu/bookforum/showthread.php?t=3822)

Anne Paulson 01-09-2013 01:00 PM

PLA: Is randomly selecting the update important?
 
The official description of the perceptron learning algorithm says that we randomly select an incorrectly classified point, and then update to nudge that point closer to being correctly classified.

If we actually randomly select the next point from the uniform distribution of misclassified points, we're going to have to do a lot of annoying bookkeeping, because every single time we update, we'll have to recheck every single point, figure out which ones are now misclassified (remembering that a previously correctly classified point can now be misclassified after update), and randomly pick one of the bad points for the next update.

Now, obviously, if the PLA is going to converge, it will converge no matter how we pick the next misclassified point, as long as we eventually get to every misclassified point. How much difference does it make if we select our update points non-randomly? I'll do the random update for the homework, just in case it makes a difference, but I suspect it doesn't on average.

Yann222 01-09-2013 01:12 PM

Re: PLA: Is randomly selecting the update important?
 
I had originally chosen the first mismatched point instead of a randomly selected one. For N=10 there wasn't much difference, but for N=100 the random choice performed better (to the point where it could affect the selected answer).

Also, though I wasn't able to make it happen, it seems the algorithm may not converge if next point is not selected randomly (could get stuck in some sort of infinite loop).

Anne Paulson 01-09-2013 01:24 PM

Re: PLA: Is randomly selecting the update important?
 
No, the algorithm will always converge eventually if there is linear separation, no matter how you pick the misclassified point. And the proof isn't that difficult.

But now we have one person in one thread saying that they found picking the points at random gave better results, and another person in another thread saying picking the first misclassified point gave better results. Mysterious.

My strategy would be to start with the last misclassified point, the one I just updated with; try it again if it is still misclassified, and then pick the next misclassified point in the list.

Yann222 01-09-2013 01:52 PM

Re: PLA: Is randomly selecting the update important?
 
Thanks, I'll have to check out the proof for myself, but trust you're right. I assume the book contains a proof, just haven't gotten my hands on copy yet (in transit).

As for performance between selecting 1st mismatch vs. random mismatch, I double-checked by running my code both ways a few times and random choice took considerably fewer iterations. Hopefully the other poster mixed it up when typing...

tzs29970 01-09-2013 02:19 PM

Re: PLA: Is randomly selecting the update important?
 
I don't recall if the book contains a proof of PLA convergence or not. If not, there's an outline of one in the slides for lecture 2 of Geoffrey Hinton's neural network course at Coursera. Here's a link to those slides: lecture 2. The relevant slides are in parts 2c and 2d.

magdon 01-09-2013 02:55 PM

Re: PLA: Is randomly selecting the update important?
 
The proof for the PLA convergence is given as a guided problem 1.3 (Enjoy!)

The proof applies no matter which misclassified point you pick at each iteration.

In practice I have not observed significant differences between first misclassified and random misclassified.

As pointed out picking a misclassified point requires some overhead. In CS-speak, it takes O(N) time. To pick the first one (first according to index), make a pass through the data picking the first point that is misclassified. For a random one, do the same but passing through the data in a random order.

Quote:

Originally Posted by Yann222 (Post 8491)
Thanks, I'll have to check out the proof for myself, but trust you're right. I assume the book contains a proof, just haven't gotten my hands on copy yet (in transit).

As for performance between selecting 1st mismatch vs. random mismatch, I double-checked by running my code both ways a few times and random choice took considerably fewer iterations. Hopefully the other poster mixed it up when typing...


htlin 01-09-2013 03:31 PM

Re: PLA: Is randomly selecting the update important?
 
Quote:

Originally Posted by Anne Paulson (Post 8486)
If we actually randomly select the next point from the uniform distribution of misclassified points, we're going to have to do a lot of annoying bookkeeping, because every single time we update, we'll have to recheck every single point, figure out which ones are now misclassified (remembering that a previously correctly classified point can now be misclassified after update), and randomly pick one of the bad points for the next update.

You can do a (virtual) random permutation of the points, and then selecting a misclassified point uniformly at random is equivalent to selecting the "next" misclassified point (when not using any random permutation).

For PLA convergence, the random selection doesn't matter much, and the observed differences can be data-dependent. You may want to be aware that the PLA is a very special algorithm, though. In most of the learning algorithms that you will see in the future, the "selection" shall better be of a random or some more clever strategy --- an arbitrary choice may not always work well.

Hope this helps.

Anne Paulson 01-09-2013 06:18 PM

Re: PLA: Is randomly selecting the update important?
 
You can do a (virtual) random permutation of the points,

That's got to be O(N) at least, though.

tathagata 01-10-2013 06:47 AM

Re: PLA: Is randomly selecting the update important?
 
Quote:

Originally Posted by Anne Paulson (Post 8488)
No, the algorithm will always converge eventually if there is linear separation, no matter how you pick the misclassified point. And the proof isn't that difficult.

But now we have one person in one thread saying that they found picking the points at random gave better results, and another person in another thread saying picking the first misclassified point gave better results. Mysterious.

My strategy would be to start with the last misclassified point, the one I just updated with; try it again if it is still misclassified, and then pick the next misclassified point in the list.

If you are referring to my post then I am sorry, it was a typo, randomization is faster always. About your approach, I think it is necessary that the last point is always classified correctly due to the update rule, isn't it?

Yann222 01-10-2013 09:35 AM

Re: PLA: Is randomly selecting the update important?
 
I think the algorithm "moves in the right direction" on the last point, though I think the sign of the dot product could still be incorrect after an iteration (this is what Exc.1.3 in the book walks you through). However, there's no guarantee that improvement will ever get you there (could be monotone sequence of increasingly small steps, though I think it's easy enough to convince oneself this won't happen) or that this point will not get "messed up" later in another iteration when you are taking care of a different misclassified point.

Problem (as opposed to "exercise") 1.3 of chapter one, as magdon pointed out, steps you through a full proof of PLA convergence. I haven't attempted it yet, but seems pretty involved.


All times are GMT -7. The time now is 05:49 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.