Thanks for the suggestion Elroch. Here are the steps I am following:

1) Generate a random set of data points. Values between -1 and +1. I am certain that this bit of code is working correctly.

2) Set weight = [0,0,0], eta = 0.01

3) Do do the stochastic gradient descent.

3a) Shuffle the 100 data points. (The shuffling is definitely working.)

3b) On the first shuffled data point get the gradient using the initial weight. To calculate the gradient I am using the code below, this could be wrong?

Code:

def gradient_descent(weight, x, y):
error1 = -(y*x[0])/(1+math.e**(y*dot_product(weight,x)))
error2 = -(y*x[1])/(1+math.e**(y*dot_product(weight,x)))
error3 = -(y*x[2])/(1+math.e**(y*dot_product(weight,x)))
return [error1,error2,error3]

3c) I then update the weight based on the above returned array. I do this by

Code:

error = gradient_descent(weight, x, y)
weight = weight - eta * error

Again maybe I shouldn't be using eta here but it doesn't seem to make a difference.

3d) I repeat 3b/3c for all data points, using the updated weight for each new data point.

4) Once I have updated the weight based on all the data points I then compare the final weight of the iteration with the initial weight of the iteration.

4a) To compare the weights I use the following function which finds the sqrt of the sum of the differences squared. Perhaps this is wrong?

Code:

def calc_error(new_weights, old_weights):
return math.sqrt((old_weights[0] - new_weights[0])**2 + (old_weights[1]- new_weights[1])**2 + (old_weights[2]- new_weights[2])**2)

4b) If calc_error() < 0.01 then stop.

4c) If error is still too large then go to 3 and use the new weight as the new initial weight.

Well if anyone can spot what mistake I've made or even something that doesn't look right then please say something.