 LFD Book Forum *ANSWER* q8/9
 Register FAQ Calendar Mark Forums Read #1
 arcticblue Member Join Date: Apr 2013 Posts: 17 *ANSWER* q8/9

I got both of these questions wrong and I'm not sure what I've done wrong. Is anyone who got these right willing to post their code so that I can compare it with mine to work out what I've done wrong. (I could post my code but it seems cruel to ask others to understand it.)

Any language is good but prefer python or c/c++/java. R or Octave are okay too though.
#2
 Elroch Invited Guest Join Date: Mar 2013 Posts: 143 Re: * answer * q8/9

Quote:
 Originally Posted by arcticblue I got both of these questions wrong and I'm not sure what I've done wrong. Is anyone who got these right willing to post their code so that I can compare it with mine to work out what I've done wrong. (I could post my code but it seems cruel to ask others to understand it.) Any language is good but prefer python or c/c++/java. R or Octave are okay too though.
Other people's code may not tell you much more than is in the description in English in the question, so a comparison with that is really the key. Making a detailed list of the things in the question your program should have implemented may be enlightening. (Most obvious guess - stopping criterion?)
#3
 apbarraza Junior Member Join Date: Jan 2013 Posts: 4 Re: * answer * q8/9

This is my code in Octave (it is not correct but maybe you could help me find what is wrong):

Code:
```function [N, theta, Eout] = trainLogisticRegression(eta, mfunc, bfunc, numPointsPerEpoch)
% Initialize some useful values
N=0;
theta = zeros(3, 1);
theta_last =  theta + 1;

while ((abs(theta-theta_last)>0.01)==1), %Iterate until convergence
N = N +1;
theta_last = theta;
%Generate points for the epoch
[X, y]  = getRandomPoints(numPointsPerEpoch, mfunc, bfunc);
X = [ones(numPointsPerEpoch, 1) X];
for i = 1:numPointsPerEpoch
e = y(i).*X(i, :)./(1+exp(y(i)*(theta'*X(i,:)')));
theta = theta + eta*e';
end;
end

%New set of points to calculate the error
[X, y]  = getRandomPoints(numPointsPerEpoch, mfunc, bfunc);
X = [ones(numPointsPerEpoch, 1) X];

%Error measure
Eout = 1/numPointsPerEpoch*(sum(log( 1 + exp(-1*y.*(theta'*X')'))));

end```
I run this 100 times and average N and Eout to get the requested answers.
Nonetheless, I´m missing something that I just can´t quite pin out.

Any help is appreciated.
#4
 apbarraza Junior Member Join Date: Jan 2013 Posts: 4 Re: * answer * q8/9

I think I fixed the problem. I was confused and generated random points for each epoch and what was really required was a permutation over the original training points.
This is my fix:
Code:
```function [N, theta, Eout] = trainLogisticRegression(eta, mfunc, bfunc, numPointsPerEpoch)
% Initialize some useful values
N=0;
theta = zeros(3, 1);
theta_last =  theta + 1;

%Generate training points
[X, y]  = getRandomPoints(numPointsPerEpoch, mfunc, bfunc);
X = [ones(numPointsPerEpoch, 1) X];

while ((abs(theta-theta_last)>0.01)==1), %Iterate until convergence
N = N +1;
theta_last = theta;
%Permutation of training points
perm = randperm(numPointsPerEpoch);
for i = 1:numPointsPerEpoch
e = y(perm(i)).*X(perm(i), :)./(1+exp(y(perm(i))*(theta'*X(perm(i),:)')));
theta = theta + eta*e';
end;
end

%New set of points to calculate the error
[X, y]  = getRandomPoints(numPointsPerEpoch, mfunc, bfunc);
X = [ones(numPointsPerEpoch, 1) X];

%Error measure
Eout1 = 1/numPointsPerEpoch*(sum(log( 1 + exp(-1*y.*(theta'*X')'))));
Eout2= 1/numPointsPerEpoch*sum((h(theta,X) - 1/2*(1+y)).^2);

Eout = [Eout1 Eout2];

end```
I am getting an average N of 37.100.
Eout1 is given me an average of 0.3 which is not the required answer
Eout2 is given me an average of 0.09 which is close to the final answer

I´m wondering if there is something still wrong with what I´m doing... #5
 arcticblue Member Join Date: Apr 2013 Posts: 17 Re: *ANSWER* q8/9

Thanks for the suggestion Elroch. Here are the steps I am following:

1) Generate a random set of data points. Values between -1 and +1. I am certain that this bit of code is working correctly.
2) Set weight = [0,0,0], eta = 0.01
3) Do do the stochastic gradient descent.
3a) Shuffle the 100 data points. (The shuffling is definitely working.)
3b) On the first shuffled data point get the gradient using the initial weight. To calculate the gradient I am using the code below, this could be wrong?
Code:
```def gradient_descent(weight, x, y):
error1 = -(y*x)/(1+math.e**(y*dot_product(weight,x)))
error2 = -(y*x)/(1+math.e**(y*dot_product(weight,x)))
error3 = -(y*x)/(1+math.e**(y*dot_product(weight,x)))
return [error1,error2,error3]```
3c) I then update the weight based on the above returned array. I do this by
Code:
```error = gradient_descent(weight, x, y)
weight = weight - eta * error```
Again maybe I shouldn't be using eta here but it doesn't seem to make a difference.
3d) I repeat 3b/3c for all data points, using the updated weight for each new data point.
4) Once I have updated the weight based on all the data points I then compare the final weight of the iteration with the initial weight of the iteration.
4a) To compare the weights I use the following function which finds the sqrt of the sum of the differences squared. Perhaps this is wrong?
Code:
```def calc_error(new_weights, old_weights):
return math.sqrt((old_weights - new_weights)**2 + (old_weights- new_weights)**2 + (old_weights- new_weights)**2)```
4b) If calc_error() < 0.01 then stop.
4c) If error is still too large then go to 3 and use the new weight as the new initial weight.

Well if anyone can spot what mistake I've made or even something that doesn't look right then please say something.
#6
 Elroch Invited Guest Join Date: Mar 2013 Posts: 143 Re: * answer * q8/9

Quote:
 Originally Posted by apbarraza I think I fixed the problem. ... I´m wondering if there is something still wrong with what I´m doing... Why would you think there was? #7
 catherine Member Join Date: Apr 2013 Posts: 18 Re: *ANSWER* q8/9

Hi arcticblue, my R code yields the expected average number of epochs / out-of-sample error, now that (thanks to you) I've implemented the correct exit condition for the SGD. Have you found the issue with your implementation? The approach you're outlining above seems correct. What kind of results are you getting? I'll have a look at your code if you post it.
#8
 catherine Member Join Date: Apr 2013 Posts: 18 Re: * answer * q8/9

Hi apbarraza, the problem seems to be with the way you compute the magnitude of the difference between the weight vector at the beginning and at the end of each epoch. Use norm (with p = "fro") instead of abs. Also Eout1 is the way to go. Good luck!
#9
 Elroch Invited Guest Join Date: Mar 2013 Posts: 143 Re: *ANSWER* q8/9

arctic blue, both the description of what you intended to do and code fragments 3b, 3c and 4a look fine to me. Sherlock Holmes famous maxim must apply:
"when you have eliminated the impossible, whatever remains, however improbable, must be the truth". [i.e. it must be in what you haven't posted]
#10
 apbarraza Junior Member Join Date: Jan 2013 Posts: 4 Re: * answer * q8/9

Quote:
 Originally Posted by catherine Hi apbarraza, the problem seems to be with the way you compute the magnitude of the difference between the weight vector at the beginning and at the end of each epoch. Use norm (with p = "fro") instead of abs. Also Eout1 is the way to go. Good luck!

YES !! Thank you sooo much I have been bagging my head and I can´t believe I missed this. Changed while to:

Code:
`while ((norm(theta-theta_last, "fro")>0.01))`
And now am getting average N = 342 and Eout1 = 0.1 which is the expected answer.

Thank you. Thread Tools Show Printable Version Email this Page Display Modes Linear Mode Switch to Hybrid Mode Switch to Threaded Mode Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 04:32 AM. The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.