LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 5

Reply
 
Thread Tools Display Modes
  #1  
Old 05-07-2013, 04:56 PM
arcticblue arcticblue is offline
Member
 
Join Date: Apr 2013
Posts: 17
Default *ANSWER* q8/9

I got both of these questions wrong and I'm not sure what I've done wrong. Is anyone who got these right willing to post their code so that I can compare it with mine to work out what I've done wrong. (I could post my code but it seems cruel to ask others to understand it.)

Any language is good but prefer python or c/c++/java. R or Octave are okay too though.
Reply With Quote
  #2  
Old 05-07-2013, 05:47 PM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: * answer * q8/9

Quote:
Originally Posted by arcticblue View Post
I got both of these questions wrong and I'm not sure what I've done wrong. Is anyone who got these right willing to post their code so that I can compare it with mine to work out what I've done wrong. (I could post my code but it seems cruel to ask others to understand it.)

Any language is good but prefer python or c/c++/java. R or Octave are okay too though.
Other people's code may not tell you much more than is in the description in English in the question, so a comparison with that is really the key. Making a detailed list of the things in the question your program should have implemented may be enlightening. (Most obvious guess - stopping criterion?)
Reply With Quote
  #3  
Old 05-07-2013, 06:22 PM
apbarraza apbarraza is offline
Junior Member
 
Join Date: Jan 2013
Posts: 4
Default Re: * answer * q8/9

This is my code in Octave (it is not correct but maybe you could help me find what is wrong):

Code:
function [N, theta, Eout] = trainLogisticRegression(eta, mfunc, bfunc, numPointsPerEpoch)
% Initialize some useful values
N=0;
theta = zeros(3, 1);
theta_last =  theta + 1;

while ((abs(theta-theta_last)>0.01)==1), %Iterate until convergence
	N = N +1;
	theta_last = theta;
	%Generate points for the epoch
	[X, y]  = getRandomPoints(numPointsPerEpoch, mfunc, bfunc);
	% Add intercept term
	X = [ones(numPointsPerEpoch, 1) X];
	%Gradient Descent
	for i = 1:numPointsPerEpoch
		e = y(i).*X(i, :)./(1+exp(y(i)*(theta'*X(i,:)')));
		%Adjusting parameters given gradient
		theta = theta + eta*e';
	end;
end

%New set of points to calculate the error
[X, y]  = getRandomPoints(numPointsPerEpoch, mfunc, bfunc);
% Add intercept term
X = [ones(numPointsPerEpoch, 1) X];

%Error measure
Eout = 1/numPointsPerEpoch*(sum(log( 1 + exp(-1*y.*(theta'*X')'))));

end
I run this 100 times and average N and Eout to get the requested answers.
Nonetheless, Im missing something that I just cant quite pin out.

Any help is appreciated.
Reply With Quote
  #4  
Old 05-07-2013, 07:03 PM
apbarraza apbarraza is offline
Junior Member
 
Join Date: Jan 2013
Posts: 4
Default Re: * answer * q8/9

I think I fixed the problem. I was confused and generated random points for each epoch and what was really required was a permutation over the original training points.
This is my fix:
Code:
function [N, theta, Eout] = trainLogisticRegression(eta, mfunc, bfunc, numPointsPerEpoch)
% Initialize some useful values
N=0;
theta = zeros(3, 1);
theta_last =  theta + 1;

%Generate training points
	[X, y]  = getRandomPoints(numPointsPerEpoch, mfunc, bfunc);
	% Add intercept term
	X = [ones(numPointsPerEpoch, 1) X];

while ((abs(theta-theta_last)>0.01)==1), %Iterate until convergence
	N = N +1;
	theta_last = theta;
	%Permutation of training points
	perm = randperm(numPointsPerEpoch);
	%Gradient Descent
	for i = 1:numPointsPerEpoch
		e = y(perm(i)).*X(perm(i), :)./(1+exp(y(perm(i))*(theta'*X(perm(i),:)')));
		%Adjusting parameters given gradient
		theta = theta + eta*e';
	end;
end

%New set of points to calculate the error
[X, y]  = getRandomPoints(numPointsPerEpoch, mfunc, bfunc);
% Add intercept term
X = [ones(numPointsPerEpoch, 1) X];

%Error measure
Eout1 = 1/numPointsPerEpoch*(sum(log( 1 + exp(-1*y.*(theta'*X')'))));
Eout2= 1/numPointsPerEpoch*sum((h(theta,X) - 1/2*(1+y)).^2);

Eout = [Eout1 Eout2];

end
I am getting an average N of 37.100.
Eout1 is given me an average of 0.3 which is not the required answer
Eout2 is given me an average of 0.09 which is close to the final answer

Im wondering if there is something still wrong with what Im doing...
Reply With Quote
  #5  
Old 05-08-2013, 03:04 AM
arcticblue arcticblue is offline
Member
 
Join Date: Apr 2013
Posts: 17
Default Re: *ANSWER* q8/9

Thanks for the suggestion Elroch. Here are the steps I am following:

1) Generate a random set of data points. Values between -1 and +1. I am certain that this bit of code is working correctly.
2) Set weight = [0,0,0], eta = 0.01
3) Do do the stochastic gradient descent.
3a) Shuffle the 100 data points. (The shuffling is definitely working.)
3b) On the first shuffled data point get the gradient using the initial weight. To calculate the gradient I am using the code below, this could be wrong?
Code:
def gradient_descent(weight, x, y):
    error1 = -(y*x[0])/(1+math.e**(y*dot_product(weight,x)))
    error2 = -(y*x[1])/(1+math.e**(y*dot_product(weight,x)))
    error3 = -(y*x[2])/(1+math.e**(y*dot_product(weight,x)))
    return [error1,error2,error3]
3c) I then update the weight based on the above returned array. I do this by
Code:
error = gradient_descent(weight, x, y)
weight = weight - eta * error
Again maybe I shouldn't be using eta here but it doesn't seem to make a difference.
3d) I repeat 3b/3c for all data points, using the updated weight for each new data point.
4) Once I have updated the weight based on all the data points I then compare the final weight of the iteration with the initial weight of the iteration.
4a) To compare the weights I use the following function which finds the sqrt of the sum of the differences squared. Perhaps this is wrong?
Code:
def calc_error(new_weights, old_weights):
    return math.sqrt((old_weights[0] - new_weights[0])**2 + (old_weights[1]- new_weights[1])**2 + (old_weights[2]- new_weights[2])**2)
4b) If calc_error() < 0.01 then stop.
4c) If error is still too large then go to 3 and use the new weight as the new initial weight.

Well if anyone can spot what mistake I've made or even something that doesn't look right then please say something.
Reply With Quote
  #6  
Old 05-08-2013, 04:54 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: * answer * q8/9

Quote:
Originally Posted by apbarraza View Post
I think I fixed the problem.

...

Im wondering if there is something still wrong with what Im doing...
Why would you think there was?
Reply With Quote
  #7  
Old 05-08-2013, 05:30 AM
catherine catherine is offline
Member
 
Join Date: Apr 2013
Posts: 18
Default Re: *ANSWER* q8/9

Hi arcticblue, my R code yields the expected average number of epochs / out-of-sample error, now that (thanks to you) I've implemented the correct exit condition for the SGD. Have you found the issue with your implementation? The approach you're outlining above seems correct. What kind of results are you getting? I'll have a look at your code if you post it.
Reply With Quote
  #8  
Old 05-08-2013, 05:46 AM
catherine catherine is offline
Member
 
Join Date: Apr 2013
Posts: 18
Default Re: * answer * q8/9

Hi apbarraza, the problem seems to be with the way you compute the magnitude of the difference between the weight vector at the beginning and at the end of each epoch. Use norm (with p = "fro") instead of abs. Also Eout1 is the way to go. Good luck!
Reply With Quote
  #9  
Old 05-08-2013, 05:58 AM
Elroch Elroch is offline
Invited Guest
 
Join Date: Mar 2013
Posts: 143
Default Re: *ANSWER* q8/9

arctic blue, both the description of what you intended to do and code fragments 3b, 3c and 4a look fine to me. Sherlock Holmes famous maxim must apply:
"when you have eliminated the impossible, whatever remains, however improbable, must be the truth". [i.e. it must be in what you haven't posted]
Reply With Quote
  #10  
Old 05-08-2013, 01:42 PM
apbarraza apbarraza is offline
Junior Member
 
Join Date: Jan 2013
Posts: 4
Default Re: * answer * q8/9

Quote:
Originally Posted by catherine View Post
Hi apbarraza, the problem seems to be with the way you compute the magnitude of the difference between the weight vector at the beginning and at the end of each epoch. Use norm (with p = "fro") instead of abs. Also Eout1 is the way to go. Good luck!

YES !! Thank you sooo much I have been bagging my head and I cant believe I missed this. Changed while to:

Code:
while ((norm(theta-theta_last, "fro")>0.01))
And now am getting average N = 342 and Eout1 = 0.1 which is the expected answer.

Thank you.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 06:00 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.