Quote:
Originally Posted by OlivierB
Oh ! Thank you for clarifying this point.
Actually no, I was randomly choosing one f and training set, and then running 1000 PLA on these.
The variation in convergence speed were coming from the randomly selected misclassified point at each step.
Indeed, on rereading hw1.pdf more carefully, I see that you mention "In each run, choose a random line in the plane as your target function...".
Now for each of the 1000 runs, I randomly choose a single f and training set and then run on PLA on them.
The convergence speed is now very stable, and the questions 7 and 9 do not seem ambiguous any longer.
As I already have submitted, I know that I probably get the correct results for questions 7, 8, and 10. But not for question 9! So there is still something puzzling... If by experience you have an idea about my probable mistake, I would be grateful for your insight (In fact, I already am for your first answer).

There is a quantitative indication of one part of the variation in the number of iterations in the proof of convergence of the PLA algorithm in Problem 1.3 in the book. In some problems (i.e. two separable sets of points), a much larger range of lines will perform the classification than in others. If one randomly picked lines, the expected time to converge would vary in a predictable manner between problems. Difficult problems for such a random line picking algorithm (if it deserves the term) are also ones with a loose bound on convergence using PLA.
eg) a 100 point problem can be relatively easy sometimes.