![]() |
#1
|
|||
|
|||
![]()
The official description of the perceptron learning algorithm says that we randomly select an incorrectly classified point, and then update to nudge that point closer to being correctly classified.
If we actually randomly select the next point from the uniform distribution of misclassified points, we're going to have to do a lot of annoying bookkeeping, because every single time we update, we'll have to recheck every single point, figure out which ones are now misclassified (remembering that a previously correctly classified point can now be misclassified after update), and randomly pick one of the bad points for the next update. Now, obviously, if the PLA is going to converge, it will converge no matter how we pick the next misclassified point, as long as we eventually get to every misclassified point. How much difference does it make if we select our update points non-randomly? I'll do the random update for the homework, just in case it makes a difference, but I suspect it doesn't on average. |
#2
|
|||
|
|||
![]()
I had originally chosen the first mismatched point instead of a randomly selected one. For N=10 there wasn't much difference, but for N=100 the random choice performed better (to the point where it could affect the selected answer).
Also, though I wasn't able to make it happen, it seems the algorithm may not converge if next point is not selected randomly (could get stuck in some sort of infinite loop). |
#3
|
|||
|
|||
![]()
No, the algorithm will always converge eventually if there is linear separation, no matter how you pick the misclassified point. And the proof isn't that difficult.
But now we have one person in one thread saying that they found picking the points at random gave better results, and another person in another thread saying picking the first misclassified point gave better results. Mysterious. My strategy would be to start with the last misclassified point, the one I just updated with; try it again if it is still misclassified, and then pick the next misclassified point in the list. |
#4
|
|||
|
|||
![]()
Thanks, I'll have to check out the proof for myself, but trust you're right. I assume the book contains a proof, just haven't gotten my hands on copy yet (in transit).
As for performance between selecting 1st mismatch vs. random mismatch, I double-checked by running my code both ways a few times and random choice took considerably fewer iterations. Hopefully the other poster mixed it up when typing... |
#5
|
|||
|
|||
![]()
I don't recall if the book contains a proof of PLA convergence or not. If not, there's an outline of one in the slides for lecture 2 of Geoffrey Hinton's neural network course at Coursera. Here's a link to those slides: lecture 2. The relevant slides are in parts 2c and 2d.
|
#6
|
||||
|
||||
![]()
The proof for the PLA convergence is given as a guided problem 1.3 (Enjoy!)
The proof applies no matter which misclassified point you pick at each iteration. In practice I have not observed significant differences between first misclassified and random misclassified. As pointed out picking a misclassified point requires some overhead. In CS-speak, it takes O(N) time. To pick the first one (first according to index), make a pass through the data picking the first point that is misclassified. For a random one, do the same but passing through the data in a random order. Quote:
__________________
Have faith in probability |
#7
|
||||
|
||||
![]() Quote:
For PLA convergence, the random selection doesn't matter much, and the observed differences can be data-dependent. You may want to be aware that the PLA is a very special algorithm, though. In most of the learning algorithms that you will see in the future, the "selection" shall better be of a random or some more clever strategy --- an arbitrary choice may not always work well. Hope this helps.
__________________
When one teaches, two learn. |
#8
|
|||
|
|||
![]()
You can do a (virtual) random permutation of the points,
That's got to be O(N) at least, though. |
#9
|
|||
|
|||
![]() Quote:
|
#10
|
|||
|
|||
![]()
I think the algorithm "moves in the right direction" on the last point, though I think the sign of the dot product could still be incorrect after an iteration (this is what Exc.1.3 in the book walks you through). However, there's no guarantee that improvement will ever get you there (could be monotone sequence of increasingly small steps, though I think it's easy enough to convince oneself this won't happen) or that this point will not get "messed up" later in another iteration when you are taking care of a different misclassified point.
Problem (as opposed to "exercise") 1.3 of chapter one, as magdon pointed out, steps you through a full proof of PLA convergence. I haven't attempted it yet, but seems pretty involved. |
![]() |
Tags |
pla |
Thread Tools | |
Display Modes | |
|
|