Quote:
Originally Posted by shockwavephysics
I have been trying to figure out why updating using w > w + y_n * x_n works at all. I looked up the relevant section in the text, and there are a series of questions for the student that hint at the answer. I followed that logic to it's conclusion and it does seem to show that updating in that way will always give a w that is better (for the misclassified point) than the previous w. However, I cannot figure out how one comes up with this formulation in the first place. Is there a reference to a derivation I can read?

You can read Problem 1.3 of the recommended textbook, which guides you through a simple proof. Roughly speaking, the proof says the PLA weights get more
aligned with the underlying "target weights" after each update.