I think I may be missing something concerning SGD. The description of the problems for Logistic Regression states that:

*An epoch is a full pass through the data points*.

Does that mean that each of the given answers corresponds to 100x the number iterations (e.g. answer [

**a**] is 350, so we make 3,500 iterations before converging?)

The way I am currently implementing SGD is as follows (pseudo-code):

Code:

for each trial {
generate N points, with corresponding target function and Y
set weight vector to |0|
while true {
pick random point xn, and corresponding yn
calculate e = -yn*xn/(1 + exp(yn*wt*xn)
w(t+1) = w(t) - learning_rate*e
break if ||w(t+1) - w(t)|| < tolerance
}
use weight vector to calculate Eout = E[ln(1 + exp(-Y*w*X)], for M new points, and find the mean Eout
}

In this case, I am taking the mean of the number of iterations in the inner while loop as the answer for Q9. Am I doing something wrong?

Thanks