Holland, I think you are right. I thought the same thing, and I ended up trying it with the whole dot product in the denominator, and I am getting a much better final set of w values.
Instead of just plugging in individual

values in the exponential, I go ahead and calculate the whole

dot product and use the same denominator for each element of the gradient.
Code:
double altgradient(double yn, double xn, double wtx) {
double num = -1.0*yn*xn;
double denom = 1 + exp(yn*wtx);
return num / denom;
}
double calcWTX(double x1, double x2, double w0, double w1, double w2) {
double sum = w0 + w1*x1 + w2*x2;
return sum;
}
double wtx = calcWTX(x1n, x2n, w0, w1, w2);
double g0 = altgradient(yn, 1.0, wtx);
double g1 = altgradient(yn, x1n, wtx);
double g2 = altgradient(yn, x2n, wtx);
This is giving me a much better approximation of f. However, it does increase the number of steps by quite a bit.
Thanks!