LFD Book Forum Perceptron: should w_0 (bias) be updated?
 Register FAQ Calendar Mark Forums Read

#1
07-12-2012, 01:40 PM
 fredrmueller@gmail.com Junior Member Join Date: Jul 2012 Location: Cambridge, MASS Posts: 4
Perceptron: should w_0 (bias) be updated?

The zeroth term is just a clever way to simplify the notation by adding the threshold/bias term as another term in the sum. The value of the threshold/bias, however, is not an observed quantity, though - it was chosen. So I am assuming that when updating the weights, we should NOT update the zero-th weight (the threshold/bias). Is this correct?

Thanks,

-Fred
#2
07-12-2012, 02:34 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478
Re: Perceptron: should w_0 (bias) be updated?

Quote:
 Originally Posted by fredrmueller@gmail.com The zeroth term is just a clever way to simplify the notation by adding the threshold/bias term as another term in the sum. The value of the threshold/bias, however, is not an observed quantity, though - it was chosen. So I am assuming that when updating the weights, we should NOT update the zero-th weight (the threshold/bias). Is this correct?
In fact is just like all other weights, and should be updated in the same way (which will happen automatically when you use the PLA update rule and take to include the zero-coordinate ). The intuitive reason is that some thresholds work better than others (similar to some weights working better than others) in separating the data, hence being part of the learning update will result in a better value.
__________________
Where everyone thinks alike, no one thinks very much
#3
07-16-2012, 12:38 PM
 Randy Junior Member Join Date: Jul 2012 Posts: 3
Re: Perceptron: should w_0 (bias) be updated?

Actually, w0 never converges to anything meaningful using the w = w + y*x update rule because x0 = 1 for all the data points and y is constrained to be either -1 or +1... so w0 just flips between -1 and zero forever without converging.

if you print out your w0 values as the PLA algorithm progresses, you can see this happening.
#4
07-16-2012, 06:18 PM
 tzs29970 Invited Guest Join Date: Apr 2012 Posts: 52
Re: Perceptron: should w_0 (bias) be updated?

Quote:
 Originally Posted by Randy Actually, w0 never converges to anything meaningful using the w = w + y*x update rule because x0 = 1 for all the data points and y is constrained to be either -1 or +1... so w0 just flips between -1 and zero forever without converging.
It's range is a bit more expansive than just -1 and 0. For instance, I ran PLA on 100 different instances with 23 training points, and kept track of all values w0 took on, and how many times it took on each value, and this was the result:

Code:
-4.0 6
-3.0 76
-2.0 168
-1.0 329
0.0 603
1.0 538
2.0 139
3.0 43
4.0 26
5.0 15
6.0 1
This happens because even though when w0 gets large in magnitude it biases the output to match the sign of w0, that can still be overcome by w1 and w2, and so you can get, say, a -1 point misclassified as +1 even if w0 is very negative. Same for large positive w0.

An interesting question would be what is the distribution of the values w0 can reach. It's not like a simple random walk, I don't think, because the farther it gets out the less likely it will step farther from 0 I believe.

Maybe a Gaussian? That doesn't seem exact either--looking at a few of these, they seem to be skewed rather than symmetric, but I didn't look at a lot of samples so this could just be normal random variation.
#5
07-17-2012, 08:06 AM
 Randy Junior Member Join Date: Jul 2012 Posts: 3
Re: Perceptron: should w_0 (bias) be updated?

Yeah, I saw the same thing as I kept experimenting with it.

So the problem is that w0 does a random walk over the integers, without ever converging to a meaningful value, at least if you use a starting value of 0.

Since w1 and w2 determine the orientation of the dividing line between the positive and negative points, and w0 determines it's location relative to the origin, it seems to me that this update rule can never find a good solution if the true dividing line does not pass through (x1=0,x2=0).
#6
07-17-2012, 08:41 AM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,478
Re: Perceptron: should w_0 (bias) be updated?

Quote:
 Originally Posted by Randy the problem is that w0 does a random walk over the integers, without ever converging to a meaningful value, at least if you use a starting value of 0. Since w1 and w2 determine the orientation of the dividing line between the positive and negative points, and w0 determines it's location relative to the origin, it seems to me that this update rule can never find a good solution if the true dividing line does not pass through (x1=0,x2=0).
Hint: The perceptron with weight vector is equivalent to that with weight vector for any .
__________________
Where everyone thinks alike, no one thinks very much
#7
07-17-2012, 09:15 AM
 Randy Junior Member Join Date: Jul 2012 Posts: 3
Re: Perceptron: should w_0 (bias) be updated?

Right, so the obvious solution is to normalize w after the update. This causes w0 to converge along with w1 and w2. I actually implemented this in my solution for the homework submission, but it has an effect on the on number of iterations that are required for a given initial dividing line, depending on how far from the origin it is. In general, after implementing normalization, the number of required iterations required for convergence went up. As a result I got different answers for 7 and 9.
#8
07-17-2012, 11:19 AM
 JohnH Member Join Date: Jul 2012 Posts: 43
Re: Perceptron: should w_0 (bias) be updated?

It is a mistake to talk about converging. It is the vector w that converges. w should not be normalized after each update because doing so alters the relative scale of the error adjustments performed with each iteration. I suspect that this could result in cases where convergence would fail to occur even for a linearly separable training vector.
#9
07-17-2012, 07:44 PM
 vtrajan@vtrajan.net Junior Member Join Date: Jul 2012 Posts: 5
Re: Perceptron: should w_0 (bias) be updated?

I found that for N=100, the average number of iterations to get the final w was around 2000. On 4 out of 1000 cases, the number of iterations exceeded 100,000. Did others find similar behavior? Or is there a bug in my code?

For N=10, it took about 30 iterations.
#10
07-18-2012, 03:38 AM
 JohnH Member Join Date: Jul 2012 Posts: 43
Re: Perceptron: should w_0 (bias) be updated?

Since I've not seen your code, I cannot say with certainty that it has a defect; however, the results you have reported indicate an extremely high probability that such is the case. It is also possible, though highly improbable, that you were extraordinarily unlucky and repeatedly drew random values that skewed your results.

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 04:27 AM.