LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 1

Reply
 
Thread Tools Display Modes
  #1  
Old 07-12-2012, 02:40 PM
fredrmueller@gmail.com fredrmueller@gmail.com is offline
Junior Member
 
Join Date: Jul 2012
Location: Cambridge, MASS
Posts: 4
Default Perceptron: should w_0 (bias) be updated?

The zeroth term is just a clever way to simplify the notation by adding the threshold/bias term as another term in the sum. The value of the threshold/bias, however, is not an observed quantity, though - it was chosen. So I am assuming that when updating the weights, we should NOT update the zero-th weight (the threshold/bias). Is this correct?

Thanks,

-Fred
Reply With Quote
  #2  
Old 07-12-2012, 03:34 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Perceptron: should w_0 (bias) be updated?

Quote:
Originally Posted by fredrmueller@gmail.com View Post
The zeroth term is just a clever way to simplify the notation by adding the threshold/bias term as another term in the sum. The value of the threshold/bias, however, is not an observed quantity, though - it was chosen. So I am assuming that when updating the weights, we should NOT update the zero-th weight (the threshold/bias). Is this correct?
In fact w_0 is just like all other weights, and should be updated in the same way (which will happen automatically when you use the PLA update rule and take {\bf x} to include the zero-coordinate x_0=1). The intuitive reason is that some thresholds work better than others (similar to some weights working better than others) in separating the data, hence being part of the learning update will result in a better value.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 07-16-2012, 01:38 PM
Randy Randy is offline
Junior Member
 
Join Date: Jul 2012
Posts: 3
Default Re: Perceptron: should w_0 (bias) be updated?

Actually, w0 never converges to anything meaningful using the w = w + y*x update rule because x0 = 1 for all the data points and y is constrained to be either -1 or +1... so w0 just flips between -1 and zero forever without converging.

if you print out your w0 values as the PLA algorithm progresses, you can see this happening.
Reply With Quote
  #4  
Old 07-16-2012, 07:18 PM
tzs29970 tzs29970 is offline
Invited Guest
 
Join Date: Apr 2012
Posts: 52
Default Re: Perceptron: should w_0 (bias) be updated?

Quote:
Originally Posted by Randy View Post
Actually, w0 never converges to anything meaningful using the w = w + y*x update rule because x0 = 1 for all the data points and y is constrained to be either -1 or +1... so w0 just flips between -1 and zero forever without converging.
It's range is a bit more expansive than just -1 and 0. For instance, I ran PLA on 100 different instances with 23 training points, and kept track of all values w0 took on, and how many times it took on each value, and this was the result:

Code:
-4.0 6
-3.0 76
-2.0 168
-1.0 329
0.0 603
1.0 538
2.0 139
3.0 43
4.0 26
5.0 15
6.0 1
This happens because even though when w0 gets large in magnitude it biases the output to match the sign of w0, that can still be overcome by w1 and w2, and so you can get, say, a -1 point misclassified as +1 even if w0 is very negative. Same for large positive w0.

An interesting question would be what is the distribution of the values w0 can reach. It's not like a simple random walk, I don't think, because the farther it gets out the less likely it will step farther from 0 I believe.

Maybe a Gaussian? That doesn't seem exact either--looking at a few of these, they seem to be skewed rather than symmetric, but I didn't look at a lot of samples so this could just be normal random variation.
Reply With Quote
  #5  
Old 07-17-2012, 09:06 AM
Randy Randy is offline
Junior Member
 
Join Date: Jul 2012
Posts: 3
Default Re: Perceptron: should w_0 (bias) be updated?

Yeah, I saw the same thing as I kept experimenting with it.

So the problem is that w0 does a random walk over the integers, without ever converging to a meaningful value, at least if you use a starting value of 0.

Since w1 and w2 determine the orientation of the dividing line between the positive and negative points, and w0 determines it's location relative to the origin, it seems to me that this update rule can never find a good solution if the true dividing line does not pass through (x1=0,x2=0).
Reply With Quote
  #6  
Old 07-17-2012, 09:41 AM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Perceptron: should w_0 (bias) be updated?

Quote:
Originally Posted by Randy View Post
the problem is that w0 does a random walk over the integers, without ever converging to a meaningful value, at least if you use a starting value of 0.

Since w1 and w2 determine the orientation of the dividing line between the positive and negative points, and w0 determines it's location relative to the origin, it seems to me that this update rule can never find a good solution if the true dividing line does not pass through (x1=0,x2=0).
Hint: The perceptron with weight vector {\bf w} is equivalent to that with weight vector \alpha {\bf w} for any \alpha>0.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #7  
Old 07-17-2012, 10:15 AM
Randy Randy is offline
Junior Member
 
Join Date: Jul 2012
Posts: 3
Default Re: Perceptron: should w_0 (bias) be updated?

Right, so the obvious solution is to normalize w after the update. This causes w0 to converge along with w1 and w2. I actually implemented this in my solution for the homework submission, but it has an effect on the on number of iterations that are required for a given initial dividing line, depending on how far from the origin it is. In general, after implementing normalization, the number of required iterations required for convergence went up. As a result I got different answers for 7 and 9.
Reply With Quote
  #8  
Old 07-17-2012, 12:19 PM
JohnH JohnH is offline
Member
 
Join Date: Jul 2012
Posts: 43
Default Re: Perceptron: should w_0 (bias) be updated?

It is a mistake to talk about w_{0} converging. It is the vector w that converges. w should not be normalized after each update because doing so alters the relative scale of the error adjustments performed with each iteration. I suspect that this could result in cases where convergence would fail to occur even for a linearly separable training vector.
Reply With Quote
  #9  
Old 07-17-2012, 08:44 PM
vtrajan@vtrajan.net vtrajan@vtrajan.net is offline
Junior Member
 
Join Date: Jul 2012
Posts: 5
Default Re: Perceptron: should w_0 (bias) be updated?

I found that for N=100, the average number of iterations to get the final w was around 2000. On 4 out of 1000 cases, the number of iterations exceeded 100,000. Did others find similar behavior? Or is there a bug in my code?

For N=10, it took about 30 iterations.
Reply With Quote
  #10  
Old 07-18-2012, 04:38 AM
JohnH JohnH is offline
Member
 
Join Date: Jul 2012
Posts: 43
Default Re: Perceptron: should w_0 (bias) be updated?

Since I've not seen your code, I cannot say with certainty that it has a defect; however, the results you have reported indicate an extremely high probability that such is the case. It is also possible, though highly improbable, that you were extraordinarily unlucky and repeatedly drew random values that skewed your results.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 04:45 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.