LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 1

Reply
 
Thread Tools Display Modes
  #1  
Old 04-07-2013, 01:58 PM
clarkfitzg clarkfitzg is offline
Junior Member
 
Join Date: Apr 2013
Posts: 6
Default *ANSWER* Question 9

Using the PLA on 100 points and counting the number of iterations required for convergence I observed some interesting behavior. I set the max number of iterations to 1000 and then ran 10,000 simulations. This histogram shows the results:

The data at 1000 can be interpreted as those simulations which would have taken over 1000 iterations to converge. When I removed the limit on the max number of iterations and ran the experiment I had one case take 1.4 \times 10^5 iterations to converge. From this I conclude that the perceptron doesn't always converge rapidly.

In my experiment the mean number of iterations required for convergence was around 240, which is not particularly close to the possible answers of 100 and 500. The median, however, is certainly near 100. Do other people have similar results?
Reply With Quote
  #2  
Old 04-08-2013, 07:06 AM
kafar kafar is offline
Junior Member
 
Join Date: Apr 2013
Posts: 6
Default Re: *ANSWER* Question 9

interestingly, i tried a 10k runs of N=100. the avg number of iterations to converge is close to 500.

In my lazy implementation, I always pick the first misclassified point to update the weight vector. Looking at the theory, I don't see how that matters. But since I got this question wrong, I guess a randomly chosen misclassified point would works better.
Reply With Quote
  #3  
Old 04-08-2013, 09:15 AM
Michael Reach Michael Reach is offline
Senior Member
 
Join Date: Apr 2013
Location: Baltimore, Maryland, USA
Posts: 71
Default Re: *ANSWER* Question 9

My mean was considerably smaller, but there is a very long tail. If you graph the lines at various stages, you can actually see the h hypothesis line jiggling back and forth as it gets closer to the "g" line. The size of the jiggles damps down as you go along. It needs to get close enough so that none of your random chosen set of points is on the wrong side.

If you picture shifting the g line around as far as you can till it bumps into one of the points, you get the range of potential solution "h" lines. You'll see that it depends on the initial points how much room you have for that; often you'll have lots of room. But one could imagine three points that are almost exactly colinear and the g line goes just between them - and there's almost no room to jiggle! In a case like that, it can take an awfully long time till the jiggling of the the h line settles down far enough so that it doesn't keep jumping across that very narrow range.
Reply With Quote
  #4  
Old 04-08-2013, 12:22 PM
kafar kafar is offline
Junior Member
 
Join Date: Apr 2013
Posts: 6
Default Re: *ANSWER* Question 9

Quote:
Originally Posted by Michael Reach View Post
My mean was considerably smaller, but there is a very long tail. If you graph the lines at various stages, you can actually see the h hypothesis line jiggling back and forth as it gets closer to the "g" line. The size of the jiggles damps down as you go along. It needs to get close enough so that none of your random chosen set of points is on the wrong side.

If you picture shifting the g line around as far as you can till it bumps into one of the points, you get the range of potential solution "h" lines. You'll see that it depends on the initial points how much room you have for that; often you'll have lots of room. But one could imagine three points that are almost exactly colinear and the g line goes just between them - and there's almost no room to jiggle! In a case like that, it can take an awfully long time till the jiggling of the the h line settles down far enough so that it doesn't keep jumping across that very narrow range.
very well said. i think it explains why PLA should randomly pick misclassified points for faster convergence. Perhaps at certain point, the algorithm should pick the point which is closest to f so f won't jump back and forth too much.
Reply With Quote
  #5  
Old 04-08-2013, 04:02 PM
IsidroHidalgo IsidroHidalgo is offline
Member
 
Join Date: Apr 2013
Location: Toledo (Spain)
Posts: 28
Default Re: *ANSWER* Question 9

I think this is a interesting question, do you mean really matters the misclassified point choosed at each iteration? It's very easy calculate distance of each point to select the closest...
Reply With Quote
  #6  
Old 04-08-2013, 04:52 PM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: *ANSWER* Question 9

Quote:
Originally Posted by clarkfitzg View Post
Using the PLA on 100 points and counting the number of iterations required for convergence I observed some interesting behavior. I set the max number of iterations to 1000 and then ran 10,000 simulations. This histogram shows the results:

The data at 1000 can be interpreted as those simulations which would have taken over 1000 iterations to converge. When I removed the limit on the max number of iterations and ran the experiment I had one case take 1.4 \times 10^5 iterations to converge. From this I conclude that the perceptron doesn't always converge rapidly.

In my experiment the mean number of iterations required for convergence was around 240, which is not particularly close to the possible answers of 100 and 500. The median, however, is certainly near 100. Do other people have similar results?
I suspect you did not choose your points randomly for the iterative step. If you, for example, always search sequentially through the points for a misclassified point, I believe it is not so uncommon to get caught in a loop of misclassified points. This must eventually resolve itself, but might take a while, and this leads to more long runs and a lot higher average number of iterations. I found that randomly picking misclassified points decreases the average number of iterations a lot and shortens the long tail, presumably by limiting repeated bad choices of misclassified point.
Reply With Quote
  #7  
Old 04-08-2013, 08:26 PM
kafar kafar is offline
Junior Member
 
Join Date: Apr 2013
Posts: 6
Default Re: *ANSWER* Question 9

Quote:
Originally Posted by Elroch View Post
I suspect you did not choose your points randomly for the iterative step. If you, for example, always search sequentially through the points for a misclassified point, I believe it is not so uncommon to get caught in a loop of misclassified points. This must eventually resolve itself, but might take a while, and this leads to more long runs and a lot higher average number of iterations. I found that randomly picking misclassified points decreases the average number of iterations a lot and shortens the long tail, presumably by limiting repeated bad choices of misclassified point.
i see it in a different aspect. i won't call them bad choices of misclassified points but less informative. I've been thinking a bit. If the misclassified points are randomly picked, the algorithm has a better chance obtaining "knowledge" contained in the entire sample space, whereas obtaining knowledge by sequentially picking points is always much slower. So slower the convergence.

RE:IsidroHidalgo it was just a guess. i didn't run the experiment to verify. i don't think it would be good idea to look at distance from the start. but after the entire sample has been looked at, **maybe** closer points can be better choices. for the sake of this homework, randomly chosen misclassified points is the way to go.
Reply With Quote
  #8  
Old 04-09-2013, 08:37 AM
jlaurentum jlaurentum is offline
Member
 
Join Date: Apr 2013
Location: Venezuela
Posts: 41
Default Re: *ANSWER* Question 9

Interesting observation about randomly chosing the misclassified point, Elroch.
If I'd have done more iterations, like some did above, I'd probably have answered question 9 incorrectly as well- it seems there are too many outliers for "iterations" when taking the first misclassified point.

How about taking the misclassified point that errs farther off in the inner product? that is, taking the misclassified point for which abs(w*x) is largest? What do you guys think?
Reply With Quote
  #9  
Old 04-09-2013, 09:15 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: *ANSWER* Question 9

Quote:
Originally Posted by jlaurentum View Post
Interesting observation about randomly chosing the misclassified point, Elroch.
If I'd have done more iterations, like some did above, I'd probably have answered question 9 incorrectly as well- it seems there are too many outliers for "iterations" when taking the first misclassified point.
Lucky fellow!

Quote:
How about taking the misclassified point that errs farther off in the inner product? that is, taking the misclassified point for which abs(w*x) is largest? What do you guys think?
I had the same thought, but have not implemented it. Has anyone?
Reply With Quote
  #10  
Old 04-09-2013, 07:24 PM
clarkfitzg clarkfitzg is offline
Junior Member
 
Join Date: Apr 2013
Posts: 6
Default Re: *ANSWER* Question 9

Thanks for the insightful replies! Indeed I was choosing the first misclassified point every time. I updated my code to choose randomly and produced the following results:

The tail past 1000 is visibly smaller and the mean is much closer to the correct answer of 100 when the misclassified point is randomly chosen.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 03:17 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.