#1




*ANSWER* Question 9
Using the PLA on 100 points and counting the number of iterations required for convergence I observed some interesting behavior. I set the max number of iterations to 1000 and then ran 10,000 simulations. This histogram shows the results:
The data at 1000 can be interpreted as those simulations which would have taken over 1000 iterations to converge. When I removed the limit on the max number of iterations and ran the experiment I had one case take iterations to converge. From this I conclude that the perceptron doesn't always converge rapidly. In my experiment the mean number of iterations required for convergence was around 240, which is not particularly close to the possible answers of 100 and 500. The median, however, is certainly near 100. Do other people have similar results? 
#2




Re: *ANSWER* Question 9
interestingly, i tried a 10k runs of N=100. the avg number of iterations to converge is close to 500.
In my lazy implementation, I always pick the first misclassified point to update the weight vector. Looking at the theory, I don't see how that matters. But since I got this question wrong, I guess a randomly chosen misclassified point would works better. 
#3




Re: *ANSWER* Question 9
My mean was considerably smaller, but there is a very long tail. If you graph the lines at various stages, you can actually see the h hypothesis line jiggling back and forth as it gets closer to the "g" line. The size of the jiggles damps down as you go along. It needs to get close enough so that none of your random chosen set of points is on the wrong side.
If you picture shifting the g line around as far as you can till it bumps into one of the points, you get the range of potential solution "h" lines. You'll see that it depends on the initial points how much room you have for that; often you'll have lots of room. But one could imagine three points that are almost exactly colinear and the g line goes just between them  and there's almost no room to jiggle! In a case like that, it can take an awfully long time till the jiggling of the the h line settles down far enough so that it doesn't keep jumping across that very narrow range. 
#4




Re: *ANSWER* Question 9
Quote:

#5




Re: *ANSWER* Question 9
I think this is a interesting question, do you mean really matters the misclassified point choosed at each iteration? It's very easy calculate distance of each point to select the closest...

#6




Re: *ANSWER* Question 9
Quote:

#7




Re: *ANSWER* Question 9
Quote:
RE:IsidroHidalgo it was just a guess. i didn't run the experiment to verify. i don't think it would be good idea to look at distance from the start. but after the entire sample has been looked at, **maybe** closer points can be better choices. for the sake of this homework, randomly chosen misclassified points is the way to go. 
#8




Re: *ANSWER* Question 9
Interesting observation about randomly chosing the misclassified point, Elroch.
If I'd have done more iterations, like some did above, I'd probably have answered question 9 incorrectly as well it seems there are too many outliers for "iterations" when taking the first misclassified point. How about taking the misclassified point that errs farther off in the inner product? that is, taking the misclassified point for which abs(w*x) is largest? What do you guys think? 
#9




Re: *ANSWER* Question 9
Quote:
Quote:

#10




Re: *ANSWER* Question 9
Thanks for the insightful replies! Indeed I was choosing the first misclassified point every time. I updated my code to choose randomly and produced the following results:
The tail past 1000 is visibly smaller and the mean is much closer to the correct answer of 100 when the misclassified point is randomly chosen. 
Thread Tools  
Display Modes  

