 LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 5 (http://book.caltech.edu/bookforum/forumdisplay.php?f=134)

 arcticblue 05-07-2013 04:56 PM

I got both of these questions wrong and I'm not sure what I've done wrong. Is anyone who got these right willing to post their code so that I can compare it with mine to work out what I've done wrong. (I could post my code but it seems cruel to ask others to understand it.)

Any language is good but prefer python or c/c++/java. R or Octave are okay too though.

 Elroch 05-07-2013 05:47 PM

Quote:
 Originally Posted by arcticblue (Post 10757) I got both of these questions wrong and I'm not sure what I've done wrong. Is anyone who got these right willing to post their code so that I can compare it with mine to work out what I've done wrong. (I could post my code but it seems cruel to ask others to understand it.) Any language is good but prefer python or c/c++/java. R or Octave are okay too though.
Other people's code may not tell you much more than is in the description in English in the question, so a comparison with that is really the key. Making a detailed list of the things in the question your program should have implemented may be enlightening. (Most obvious guess - stopping criterion?)

 apbarraza 05-07-2013 06:22 PM

This is my code in Octave (it is not correct but maybe you could help me find what is wrong):

Code:

``` function [N, theta, Eout] = trainLogisticRegression(eta, mfunc, bfunc, numPointsPerEpoch) % Initialize some useful values N=0; theta = zeros(3, 1); theta_last =  theta + 1; while ((abs(theta-theta_last)>0.01)==1), %Iterate until convergence         N = N +1;         theta_last = theta;         %Generate points for the epoch         [X, y]  = getRandomPoints(numPointsPerEpoch, mfunc, bfunc);         % Add intercept term         X = [ones(numPointsPerEpoch, 1) X];         %Gradient Descent         for i = 1:numPointsPerEpoch                 e = y(i).*X(i, :)./(1+exp(y(i)*(theta'*X(i,:)')));                 %Adjusting parameters given gradient                 theta = theta + eta*e';         end; end %New set of points to calculate the error [X, y]  = getRandomPoints(numPointsPerEpoch, mfunc, bfunc); % Add intercept term X = [ones(numPointsPerEpoch, 1) X]; %Error measure Eout = 1/numPointsPerEpoch*(sum(log( 1 + exp(-1*y.*(theta'*X')')))); end```
I run this 100 times and average N and Eout to get the requested answers.
Nonetheless, I´m missing something that I just can´t quite pin out.

Any help is appreciated.

 apbarraza 05-07-2013 07:03 PM

I think I fixed the problem. I was confused and generated random points for each epoch and what was really required was a permutation over the original training points.
This is my fix:
Code:

```function [N, theta, Eout] = trainLogisticRegression(eta, mfunc, bfunc, numPointsPerEpoch) % Initialize some useful values N=0; theta = zeros(3, 1); theta_last =  theta + 1; %Generate training points         [X, y]  = getRandomPoints(numPointsPerEpoch, mfunc, bfunc);         % Add intercept term         X = [ones(numPointsPerEpoch, 1) X]; while ((abs(theta-theta_last)>0.01)==1), %Iterate until convergence         N = N +1;         theta_last = theta;         %Permutation of training points         perm = randperm(numPointsPerEpoch);         %Gradient Descent         for i = 1:numPointsPerEpoch                 e = y(perm(i)).*X(perm(i), :)./(1+exp(y(perm(i))*(theta'*X(perm(i),:)')));                 %Adjusting parameters given gradient                 theta = theta + eta*e';         end; end %New set of points to calculate the error [X, y]  = getRandomPoints(numPointsPerEpoch, mfunc, bfunc); % Add intercept term X = [ones(numPointsPerEpoch, 1) X]; %Error measure Eout1 = 1/numPointsPerEpoch*(sum(log( 1 + exp(-1*y.*(theta'*X')')))); Eout2= 1/numPointsPerEpoch*sum((h(theta,X) - 1/2*(1+y)).^2); Eout = [Eout1 Eout2]; end```
I am getting an average N of 37.100.
Eout1 is given me an average of 0.3 which is not the required answer
Eout2 is given me an average of 0.09 which is close to the final answer

I´m wondering if there is something still wrong with what I´m doing...:clueless:

 arcticblue 05-08-2013 03:04 AM

Thanks for the suggestion Elroch. Here are the steps I am following:

1) Generate a random set of data points. Values between -1 and +1. I am certain that this bit of code is working correctly.
2) Set weight = [0,0,0], eta = 0.01
3) Do do the stochastic gradient descent.
3a) Shuffle the 100 data points. (The shuffling is definitely working.)
3b) On the first shuffled data point get the gradient using the initial weight. To calculate the gradient I am using the code below, this could be wrong?
Code:

```def gradient_descent(weight, x, y):     error1 = -(y*x)/(1+math.e**(y*dot_product(weight,x)))     error2 = -(y*x)/(1+math.e**(y*dot_product(weight,x)))     error3 = -(y*x)/(1+math.e**(y*dot_product(weight,x)))     return [error1,error2,error3]```
3c) I then update the weight based on the above returned array. I do this by
Code:

```error = gradient_descent(weight, x, y) weight = weight - eta * error```
Again maybe I shouldn't be using eta here but it doesn't seem to make a difference.
3d) I repeat 3b/3c for all data points, using the updated weight for each new data point.
4) Once I have updated the weight based on all the data points I then compare the final weight of the iteration with the initial weight of the iteration.
4a) To compare the weights I use the following function which finds the sqrt of the sum of the differences squared. Perhaps this is wrong?
Code:

```def calc_error(new_weights, old_weights):     return math.sqrt((old_weights - new_weights)**2 + (old_weights- new_weights)**2 + (old_weights- new_weights)**2)```
4b) If calc_error() < 0.01 then stop.
4c) If error is still too large then go to 3 and use the new weight as the new initial weight.

Well if anyone can spot what mistake I've made or even something that doesn't look right then please say something.

 Elroch 05-08-2013 04:54 AM

Quote:
 Originally Posted by apbarraza (Post 10760) I think I fixed the problem. ... I´m wondering if there is something still wrong with what I´m doing...:clueless:
Why would you think there was? :)

 catherine 05-08-2013 05:30 AM

Hi arcticblue, my R code yields the expected average number of epochs / out-of-sample error, now that (thanks to you) I've implemented the correct exit condition for the SGD. Have you found the issue with your implementation? The approach you're outlining above seems correct. What kind of results are you getting? I'll have a look at your code if you post it.

 catherine 05-08-2013 05:46 AM

Hi apbarraza, the problem seems to be with the way you compute the magnitude of the difference between the weight vector at the beginning and at the end of each epoch. Use norm (with p = "fro") instead of abs. Also Eout1 is the way to go. Good luck!

 Elroch 05-08-2013 05:58 AM

arctic blue, both the description of what you intended to do and code fragments 3b, 3c and 4a look fine to me. Sherlock Holmes famous maxim must apply:
"when you have eliminated the impossible, whatever remains, however improbable, must be the truth". [i.e. it must be in what you haven't posted]

 apbarraza 05-08-2013 01:42 PM

Quote:
 Originally Posted by catherine (Post 10773) Hi apbarraza, the problem seems to be with the way you compute the magnitude of the difference between the weight vector at the beginning and at the end of each epoch. Use norm (with p = "fro") instead of abs. Also Eout1 is the way to go. Good luck!

YES !! Thank you sooo much I have been bagging my head and I can´t believe I missed this. Changed while to:

Code:

`while ((norm(theta-theta_last, "fro")>0.01))`
And now am getting average N = 342 and Eout1 = 0.1 which is the expected answer.

Thank you.

All times are GMT -7. The time now is 12:05 PM.