#1




*ANSWER* q8/9
I got both of these questions wrong and I'm not sure what I've done wrong. Is anyone who got these right willing to post their code so that I can compare it with mine to work out what I've done wrong. (I could post my code but it seems cruel to ask others to understand it.)
Any language is good but prefer python or c/c++/java. R or Octave are okay too though. 
#2




Re: * answer * q8/9
Quote:

#3




Re: * answer * q8/9
This is my code in Octave (it is not correct but maybe you could help me find what is wrong):
Code:
function [N, theta, Eout] = trainLogisticRegression(eta, mfunc, bfunc, numPointsPerEpoch) % Initialize some useful values N=0; theta = zeros(3, 1); theta_last = theta + 1; while ((abs(thetatheta_last)>0.01)==1), %Iterate until convergence N = N +1; theta_last = theta; %Generate points for the epoch [X, y] = getRandomPoints(numPointsPerEpoch, mfunc, bfunc); % Add intercept term X = [ones(numPointsPerEpoch, 1) X]; %Gradient Descent for i = 1:numPointsPerEpoch e = y(i).*X(i, :)./(1+exp(y(i)*(theta'*X(i,:)'))); %Adjusting parameters given gradient theta = theta + eta*e'; end; end %New set of points to calculate the error [X, y] = getRandomPoints(numPointsPerEpoch, mfunc, bfunc); % Add intercept term X = [ones(numPointsPerEpoch, 1) X]; %Error measure Eout = 1/numPointsPerEpoch*(sum(log( 1 + exp(1*y.*(theta'*X')')))); end Nonetheless, I´m missing something that I just can´t quite pin out. Any help is appreciated. 
#4




Re: * answer * q8/9
I think I fixed the problem. I was confused and generated random points for each epoch and what was really required was a permutation over the original training points.
This is my fix: Code:
function [N, theta, Eout] = trainLogisticRegression(eta, mfunc, bfunc, numPointsPerEpoch) % Initialize some useful values N=0; theta = zeros(3, 1); theta_last = theta + 1; %Generate training points [X, y] = getRandomPoints(numPointsPerEpoch, mfunc, bfunc); % Add intercept term X = [ones(numPointsPerEpoch, 1) X]; while ((abs(thetatheta_last)>0.01)==1), %Iterate until convergence N = N +1; theta_last = theta; %Permutation of training points perm = randperm(numPointsPerEpoch); %Gradient Descent for i = 1:numPointsPerEpoch e = y(perm(i)).*X(perm(i), :)./(1+exp(y(perm(i))*(theta'*X(perm(i),:)'))); %Adjusting parameters given gradient theta = theta + eta*e'; end; end %New set of points to calculate the error [X, y] = getRandomPoints(numPointsPerEpoch, mfunc, bfunc); % Add intercept term X = [ones(numPointsPerEpoch, 1) X]; %Error measure Eout1 = 1/numPointsPerEpoch*(sum(log( 1 + exp(1*y.*(theta'*X')')))); Eout2= 1/numPointsPerEpoch*sum((h(theta,X)  1/2*(1+y)).^2); Eout = [Eout1 Eout2]; end Eout1 is given me an average of 0.3 which is not the required answer Eout2 is given me an average of 0.09 which is close to the final answer I´m wondering if there is something still wrong with what I´m doing... 
#5




Re: *ANSWER* q8/9
Thanks for the suggestion Elroch. Here are the steps I am following:
1) Generate a random set of data points. Values between 1 and +1. I am certain that this bit of code is working correctly. 2) Set weight = [0,0,0], eta = 0.01 3) Do do the stochastic gradient descent. 3a) Shuffle the 100 data points. (The shuffling is definitely working.) 3b) On the first shuffled data point get the gradient using the initial weight. To calculate the gradient I am using the code below, this could be wrong? Code:
def gradient_descent(weight, x, y): error1 = (y*x[0])/(1+math.e**(y*dot_product(weight,x))) error2 = (y*x[1])/(1+math.e**(y*dot_product(weight,x))) error3 = (y*x[2])/(1+math.e**(y*dot_product(weight,x))) return [error1,error2,error3] Code:
error = gradient_descent(weight, x, y) weight = weight  eta * error 3d) I repeat 3b/3c for all data points, using the updated weight for each new data point. 4) Once I have updated the weight based on all the data points I then compare the final weight of the iteration with the initial weight of the iteration. 4a) To compare the weights I use the following function which finds the sqrt of the sum of the differences squared. Perhaps this is wrong? Code:
def calc_error(new_weights, old_weights): return math.sqrt((old_weights[0]  new_weights[0])**2 + (old_weights[1] new_weights[1])**2 + (old_weights[2] new_weights[2])**2) 4c) If error is still too large then go to 3 and use the new weight as the new initial weight. Well if anyone can spot what mistake I've made or even something that doesn't look right then please say something. 
#6




Re: * answer * q8/9
Why would you think there was?

#7




Re: *ANSWER* q8/9
Hi arcticblue, my R code yields the expected average number of epochs / outofsample error, now that (thanks to you) I've implemented the correct exit condition for the SGD. Have you found the issue with your implementation? The approach you're outlining above seems correct. What kind of results are you getting? I'll have a look at your code if you post it.

#8




Re: * answer * q8/9
Hi apbarraza, the problem seems to be with the way you compute the magnitude of the difference between the weight vector at the beginning and at the end of each epoch. Use norm (with p = "fro") instead of abs. Also Eout1 is the way to go. Good luck!

#9




Re: *ANSWER* q8/9
arctic blue, both the description of what you intended to do and code fragments 3b, 3c and 4a look fine to me. Sherlock Holmes famous maxim must apply:
"when you have eliminated the impossible, whatever remains, however improbable, must be the truth". [i.e. it must be in what you haven't posted] 
#10




Re: * answer * q8/9
Quote:
YES !! Thank you sooo much I have been bagging my head and I can´t believe I missed this. Changed while to: Code:
while ((norm(thetatheta_last, "fro")>0.01)) Thank you. 
Thread Tools  
Display Modes  

