LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 5 (http://book.caltech.edu/bookforum/forumdisplay.php?f=134)
-   -   Evaluating a gradient function with vectors (http://book.caltech.edu/bookforum/showthread.php?t=450)

kurts 05-05-2012 12:46 PM

Evaluating a gradient function with vectors
Let's say we are using SGD with a gradient function
-(yn*xn)/(1 + e^(yn*w*xn)) where xn and w are 3-element vectors.

When I evaluate this function, can I evaluate it 3 times, once for each corresponding x[i] and w[i], and thus get a 3-element gradient vector, then update w from that vector?
Something like:


double gradient(double yn, double xn, double wn) {
  double num = -1.0*yn*xn;
  double denom = 1 + exp(yn*wn*xn);
  return num / denom;

double g0 = gradient(yn, 1.0, w0);
double g1 = gradient(yn, x1n, w1);
double g2 = gradient(yn, x2n, w2);
w0 = w0 - eta*g0;
w1 = w1 - eta*g1;   
w2 = w2 - eta*g2;

I'm not entirely confident that this is correct.

holland 05-05-2012 09:28 PM

Re: Evaluating a gradient function with vectors
I think your approach is wrong for the denominator. My understanding is that w*x is a inner product that makes a scalar, so the only vector part of the gradient is the yn*xn on the top of the fraction.

(In particular, all 3 w and x terms would be used in the exponential in the denominator for each of the 3 values in the gradient. But your numerator would use the 3 different x's for the 3 different terms.)

kkkkk 05-05-2012 09:50 PM

Re: Evaluating a gradient function with vectors
Basically you have to calculate the partial derivative of the error function with respect to each dimension of w vector.
Then update each dimension of w using its partial derivative and the original w vector, and also the -eta.

This will help:

kurts 05-05-2012 10:13 PM

Re: Evaluating a gradient function with vectors
Holland, I think you are right. I thought the same thing, and I ended up trying it with the whole dot product in the denominator, and I am getting a much better final set of w values.

Instead of just plugging in individual w_nx_n values in the exponential, I go ahead and calculate the whole {\mathbf{w_n^Tx_n}} dot product and use the same denominator for each element of the gradient.


double altgradient(double yn, double xn, double wtx) {
  double num = -1.0*yn*xn;
  double denom = 1 + exp(yn*wtx);
  return num / denom;

double calcWTX(double x1, double x2, double w0, double w1, double w2) {
  double sum = w0 + w1*x1 + w2*x2;
  return sum;

double wtx = calcWTX(x1n, x2n, w0, w1, w2);
double g0 = altgradient(yn, 1.0, wtx);
double g1 = altgradient(yn, x1n, wtx);
double g2 = altgradient(yn, x2n, wtx);

This is giving me a much better approximation of f. However, it does increase the number of steps by quite a bit.


sakumar 05-08-2012 01:49 AM

Re: Evaluating a gradient function with vectors
I realized that the numerator term (y_n*x_n) is a size 3 x n matrix that doesn't change for a particular data-set. So I evaluate it before I start the logistic regression loop. Inside the loop, the denominator has to be evaluated because w is changing each time.

All times are GMT -7. The time now is 03:04 AM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.