LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 1 (http://book.caltech.edu/bookforum/forumdisplay.php?f=130)
-   -   Perceptron: should w_0 (bias) be updated? (http://book.caltech.edu/bookforum/showthread.php?t=845)

fredrmueller@gmail.com 07-12-2012 02:40 PM

Perceptron: should w_0 (bias) be updated?
 
The zeroth term is just a clever way to simplify the notation by adding the threshold/bias term as another term in the sum. The value of the threshold/bias, however, is not an observed quantity, though - it was chosen. So I am assuming that when updating the weights, we should NOT update the zero-th weight (the threshold/bias). Is this correct?

Thanks,

-Fred

yaser 07-12-2012 03:34 PM

Re: Perceptron: should w_0 (bias) be updated?
 
Quote:

Originally Posted by fredrmueller@gmail.com (Post 3386)
The zeroth term is just a clever way to simplify the notation by adding the threshold/bias term as another term in the sum. The value of the threshold/bias, however, is not an observed quantity, though - it was chosen. So I am assuming that when updating the weights, we should NOT update the zero-th weight (the threshold/bias). Is this correct?

In fact w_0 is just like all other weights, and should be updated in the same way (which will happen automatically when you use the PLA update rule and take {\bf x} to include the zero-coordinate x_0=1). The intuitive reason is that some thresholds work better than others (similar to some weights working better than others) in separating the data, hence being part of the learning update will result in a better value.

Randy 07-16-2012 01:38 PM

Re: Perceptron: should w_0 (bias) be updated?
 
Actually, w0 never converges to anything meaningful using the w = w + y*x update rule because x0 = 1 for all the data points and y is constrained to be either -1 or +1... so w0 just flips between -1 and zero forever without converging.

if you print out your w0 values as the PLA algorithm progresses, you can see this happening.

tzs29970 07-16-2012 07:18 PM

Re: Perceptron: should w_0 (bias) be updated?
 
Quote:

Originally Posted by Randy (Post 3437)
Actually, w0 never converges to anything meaningful using the w = w + y*x update rule because x0 = 1 for all the data points and y is constrained to be either -1 or +1... so w0 just flips between -1 and zero forever without converging.

It's range is a bit more expansive than just -1 and 0. For instance, I ran PLA on 100 different instances with 23 training points, and kept track of all values w0 took on, and how many times it took on each value, and this was the result:

Code:

-4.0 6
-3.0 76
-2.0 168
-1.0 329
0.0 603
1.0 538
2.0 139
3.0 43
4.0 26
5.0 15
6.0 1

This happens because even though when w0 gets large in magnitude it biases the output to match the sign of w0, that can still be overcome by w1 and w2, and so you can get, say, a -1 point misclassified as +1 even if w0 is very negative. Same for large positive w0.

An interesting question would be what is the distribution of the values w0 can reach. It's not like a simple random walk, I don't think, because the farther it gets out the less likely it will step farther from 0 I believe.

Maybe a Gaussian? That doesn't seem exact either--looking at a few of these, they seem to be skewed rather than symmetric, but I didn't look at a lot of samples so this could just be normal random variation.

Randy 07-17-2012 09:06 AM

Re: Perceptron: should w_0 (bias) be updated?
 
Yeah, I saw the same thing as I kept experimenting with it.

So the problem is that w0 does a random walk over the integers, without ever converging to a meaningful value, at least if you use a starting value of 0.

Since w1 and w2 determine the orientation of the dividing line between the positive and negative points, and w0 determines it's location relative to the origin, it seems to me that this update rule can never find a good solution if the true dividing line does not pass through (x1=0,x2=0).

yaser 07-17-2012 09:41 AM

Re: Perceptron: should w_0 (bias) be updated?
 
Quote:

Originally Posted by Randy (Post 3468)
the problem is that w0 does a random walk over the integers, without ever converging to a meaningful value, at least if you use a starting value of 0.

Since w1 and w2 determine the orientation of the dividing line between the positive and negative points, and w0 determines it's location relative to the origin, it seems to me that this update rule can never find a good solution if the true dividing line does not pass through (x1=0,x2=0).

Hint: The perceptron with weight vector {\bf w} is equivalent to that with weight vector \alpha {\bf w} for any \alpha>0.

Randy 07-17-2012 10:15 AM

Re: Perceptron: should w_0 (bias) be updated?
 
Right, so the obvious solution is to normalize w after the update. This causes w0 to converge along with w1 and w2. I actually implemented this in my solution for the homework submission, but it has an effect on the on number of iterations that are required for a given initial dividing line, depending on how far from the origin it is. In general, after implementing normalization, the number of required iterations required for convergence went up. As a result I got different answers for 7 and 9.

JohnH 07-17-2012 12:19 PM

Re: Perceptron: should w_0 (bias) be updated?
 
It is a mistake to talk about w_{0} converging. It is the vector w that converges. w should not be normalized after each update because doing so alters the relative scale of the error adjustments performed with each iteration. I suspect that this could result in cases where convergence would fail to occur even for a linearly separable training vector.

vtrajan@vtrajan.net 07-17-2012 08:44 PM

Re: Perceptron: should w_0 (bias) be updated?
 
I found that for N=100, the average number of iterations to get the final w was around 2000. On 4 out of 1000 cases, the number of iterations exceeded 100,000. Did others find similar behavior? Or is there a bug in my code?

For N=10, it took about 30 iterations.

JohnH 07-18-2012 04:38 AM

Re: Perceptron: should w_0 (bias) be updated?
 
Since I've not seen your code, I cannot say with certainty that it has a defect; however, the results you have reported indicate an extremely high probability that such is the case. It is also possible, though highly improbable, that you were extraordinarily unlucky and repeatedly drew random values that skewed your results.


All times are GMT -7. The time now is 06:54 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.