View Single Post
  #34  
Old 04-11-2012, 02:36 PM
shockwavephysics shockwavephysics is offline
Member
 
Join Date: Apr 2012
Posts: 17
Default Re: Perceptron Learning Algorithm

Quote:
Originally Posted by yaser View Post
Let me use the book notation to avoid confusion. You have two points {\bf x}_1 and {\bf x}_2 (which you called a1 and a2) and their target outputs (which you called assignment) are f({\bf x}_1)=+1 and f({\bf x}_2)=-1.

Either point, call it just {\bf x} for simplicity, is a vector that has d components {\bf x}=x_1,x_2,\cdots,x_d. Notice that bold {\bf x} denotes a full data point, while italic x denotes a component in that data point. We add a constant 1 component to each data point {\bf x} and call the component x_0 to simplify the expression for the perceptron. If the weight vector of the perceptron is {\bf w}=w_0,w_1,w_2,\cdots,w_d (where w_0 takes care of the threshold value of that perceptron), then the perceptron implements

{\rm sign}(w_0x_0+w_1x_1+w_2x_2+\cdots+w_dx_d)

where `{\rm sign()}' returns +1 if its argument is positive and returns -1 if its argument is negative.

Example: Say the first data point {\bf x}_1=(3,4) (two dimensional, so d=2). Add the constant component x_0=1 and you have {\bf x}_1=(1,3,4). Therefore, the percepton's output on this point is {\rm sign}(w_0+3w_1+4w_2). If this formula returns -1 which is different from the target output +1, the PLA adjusts the values of the weights trying to make the perceptron output agree with the target output for this point {\bf x}. It uses the specific PLA update rule to achieve that.
I have been trying to figure out why updating using w -> w + y_n * x_n works at all. I looked up the relevant section in the text, and there are a series of questions for the student that hint at the answer. I followed that logic to it's conclusion and it does seem to show that updating in that way will always give a w that is better (for the misclassified point) than the previous w. However, I cannot figure out how one comes up with this formulation in the first place. Is there a reference to a derivation I can read?
Reply With Quote