Re: How is linearity of PLA obvious?
May I add a technical point to the answer: for maximum generality, it is necessary to deal with all possible cases of the weight parameters.
The case where parameter w2 is nonzero was dealt with by magdon. The case where w1 is nonzero is similarly shown to a straight line separator in the plane. But if both w1 and w2 are zero, there is no separation: the perceptron function is a constant for all points (1, x1, x2). This does not cause any problems with the operation of the algorithm, as one step of the algorithm stops this being so.
To be maximally pedantic, in the case where the input data set consists of the one point (1, x1, x2)=(1, 0, 0) with y either +1 or 1, and we start with w1=w2=0, the perceptron algorithm still works (in one step) but it never gives a line in the plane.
