Re: *ANSWER* Question 9
interestingly, i tried a 10k runs of N=100. the avg number of iterations to converge is close to 500.
In my lazy implementation, I always pick the first misclassified point to update the weight vector. Looking at the theory, I don't see how that matters. But since I got this question wrong, I guess a randomly chosen misclassified point would works better.
