LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 1

Reply
 
Thread Tools Display Modes
  #1  
Old 01-09-2013, 01:00 PM
Anne Paulson Anne Paulson is offline
Senior Member
 
Join Date: Jan 2013
Location: Silicon Valley
Posts: 52
Question PLA: Is randomly selecting the update important?

The official description of the perceptron learning algorithm says that we randomly select an incorrectly classified point, and then update to nudge that point closer to being correctly classified.

If we actually randomly select the next point from the uniform distribution of misclassified points, we're going to have to do a lot of annoying bookkeeping, because every single time we update, we'll have to recheck every single point, figure out which ones are now misclassified (remembering that a previously correctly classified point can now be misclassified after update), and randomly pick one of the bad points for the next update.

Now, obviously, if the PLA is going to converge, it will converge no matter how we pick the next misclassified point, as long as we eventually get to every misclassified point. How much difference does it make if we select our update points non-randomly? I'll do the random update for the homework, just in case it makes a difference, but I suspect it doesn't on average.
Reply With Quote
  #2  
Old 01-09-2013, 01:12 PM
Yann222 Yann222 is offline
Member
 
Join Date: Jan 2013
Location: SoCal
Posts: 16
Default Re: PLA: Is randomly selecting the update important?

I had originally chosen the first mismatched point instead of a randomly selected one. For N=10 there wasn't much difference, but for N=100 the random choice performed better (to the point where it could affect the selected answer).

Also, though I wasn't able to make it happen, it seems the algorithm may not converge if next point is not selected randomly (could get stuck in some sort of infinite loop).
Reply With Quote
  #3  
Old 01-09-2013, 01:24 PM
Anne Paulson Anne Paulson is offline
Senior Member
 
Join Date: Jan 2013
Location: Silicon Valley
Posts: 52
Default Re: PLA: Is randomly selecting the update important?

No, the algorithm will always converge eventually if there is linear separation, no matter how you pick the misclassified point. And the proof isn't that difficult.

But now we have one person in one thread saying that they found picking the points at random gave better results, and another person in another thread saying picking the first misclassified point gave better results. Mysterious.

My strategy would be to start with the last misclassified point, the one I just updated with; try it again if it is still misclassified, and then pick the next misclassified point in the list.
Reply With Quote
  #4  
Old 01-09-2013, 01:52 PM
Yann222 Yann222 is offline
Member
 
Join Date: Jan 2013
Location: SoCal
Posts: 16
Default Re: PLA: Is randomly selecting the update important?

Thanks, I'll have to check out the proof for myself, but trust you're right. I assume the book contains a proof, just haven't gotten my hands on copy yet (in transit).

As for performance between selecting 1st mismatch vs. random mismatch, I double-checked by running my code both ways a few times and random choice took considerably fewer iterations. Hopefully the other poster mixed it up when typing...
Reply With Quote
  #5  
Old 01-09-2013, 02:19 PM
tzs29970 tzs29970 is offline
Invited Guest
 
Join Date: Apr 2012
Posts: 52
Default Re: PLA: Is randomly selecting the update important?

I don't recall if the book contains a proof of PLA convergence or not. If not, there's an outline of one in the slides for lecture 2 of Geoffrey Hinton's neural network course at Coursera. Here's a link to those slides: lecture 2. The relevant slides are in parts 2c and 2d.
Reply With Quote
  #6  
Old 01-09-2013, 02:55 PM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: PLA: Is randomly selecting the update important?

The proof for the PLA convergence is given as a guided problem 1.3 (Enjoy!)

The proof applies no matter which misclassified point you pick at each iteration.

In practice I have not observed significant differences between first misclassified and random misclassified.

As pointed out picking a misclassified point requires some overhead. In CS-speak, it takes O(N) time. To pick the first one (first according to index), make a pass through the data picking the first point that is misclassified. For a random one, do the same but passing through the data in a random order.

Quote:
Originally Posted by Yann222 View Post
Thanks, I'll have to check out the proof for myself, but trust you're right. I assume the book contains a proof, just haven't gotten my hands on copy yet (in transit).

As for performance between selecting 1st mismatch vs. random mismatch, I double-checked by running my code both ways a few times and random choice took considerably fewer iterations. Hopefully the other poster mixed it up when typing...
__________________
Have faith in probability
Reply With Quote
  #7  
Old 01-09-2013, 03:31 PM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: PLA: Is randomly selecting the update important?

Quote:
Originally Posted by Anne Paulson View Post
If we actually randomly select the next point from the uniform distribution of misclassified points, we're going to have to do a lot of annoying bookkeeping, because every single time we update, we'll have to recheck every single point, figure out which ones are now misclassified (remembering that a previously correctly classified point can now be misclassified after update), and randomly pick one of the bad points for the next update.
You can do a (virtual) random permutation of the points, and then selecting a misclassified point uniformly at random is equivalent to selecting the "next" misclassified point (when not using any random permutation).

For PLA convergence, the random selection doesn't matter much, and the observed differences can be data-dependent. You may want to be aware that the PLA is a very special algorithm, though. In most of the learning algorithms that you will see in the future, the "selection" shall better be of a random or some more clever strategy --- an arbitrary choice may not always work well.

Hope this helps.
__________________
When one teaches, two learn.
Reply With Quote
  #8  
Old 01-09-2013, 06:18 PM
Anne Paulson Anne Paulson is offline
Senior Member
 
Join Date: Jan 2013
Location: Silicon Valley
Posts: 52
Default Re: PLA: Is randomly selecting the update important?

You can do a (virtual) random permutation of the points,

That's got to be O(N) at least, though.
Reply With Quote
  #9  
Old 01-10-2013, 06:47 AM
tathagata tathagata is offline
Junior Member
 
Join Date: Jan 2013
Posts: 9
Default Re: PLA: Is randomly selecting the update important?

Quote:
Originally Posted by Anne Paulson View Post
No, the algorithm will always converge eventually if there is linear separation, no matter how you pick the misclassified point. And the proof isn't that difficult.

But now we have one person in one thread saying that they found picking the points at random gave better results, and another person in another thread saying picking the first misclassified point gave better results. Mysterious.

My strategy would be to start with the last misclassified point, the one I just updated with; try it again if it is still misclassified, and then pick the next misclassified point in the list.
If you are referring to my post then I am sorry, it was a typo, randomization is faster always. About your approach, I think it is necessary that the last point is always classified correctly due to the update rule, isn't it?
Reply With Quote
  #10  
Old 01-10-2013, 09:35 AM
Yann222 Yann222 is offline
Member
 
Join Date: Jan 2013
Location: SoCal
Posts: 16
Default Re: PLA: Is randomly selecting the update important?

I think the algorithm "moves in the right direction" on the last point, though I think the sign of the dot product could still be incorrect after an iteration (this is what Exc.1.3 in the book walks you through). However, there's no guarantee that improvement will ever get you there (could be monotone sequence of increasingly small steps, though I think it's easy enough to convince oneself this won't happen) or that this point will not get "messed up" later in another iteration when you are taking care of a different misclassified point.

Problem (as opposed to "exercise") 1.3 of chapter one, as magdon pointed out, steps you through a full proof of PLA convergence. I haven't attempted it yet, but seems pretty involved.
Reply With Quote
Reply

Tags
pla

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 06:53 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.