04-11-2012
Re: Perceptron Learning Algorithm

Quote:
 Originally Posted by yaser Let me use the book notation to avoid confusion. You have two points and (which you called a1 and a2) and their target outputs (which you called assignment) are and . Either point, call it just for simplicity, is a vector that has components . Notice that bold denotes a full data point, while italic denotes a component in that data point. We add a constant 1 component to each data point and call the component to simplify the expression for the perceptron. If the weight vector of the perceptron is (where takes care of the threshold value of that perceptron), then the perceptron implements where returns if its argument is positive and returns if its argument is negative. Example: Say the first data point (two dimensional, so ). Add the constant component and you have . Therefore, the percepton's output on this point is . If this formula returns which is different from the target output , the PLA adjusts the values of the weights trying to make the perceptron output agree with the target output for this point . It uses the specific PLA update rule to achieve that.
I have been trying to figure out why updating using w -> w + y_n * x_n works at all. I looked up the relevant section in the text, and there are a series of questions for the student that hint at the answer. I followed that logic to it's conclusion and it does seem to show that updating in that way will always give a w that is better (for the misclassified point) than the previous w. However, I cannot figure out how one comes up with this formulation in the first place. Is there a reference to a derivation I can read?