About the Problem 3.17b
I don’t quite understand the Problem 3.17b. What the meaning of minimize E1 over all possible (∆u, ∆v). Instead, I think it should minimize E(u+∆u,v+∆v), starting from the point (u,v)=(0,0). Is the optimal column vector [∆u,∆v]T is corresponding to the vt in the gradient descent algorithm (here, as the problem said, it is ∆E(u,v)), the norm (∆u,∆v)=0.5 corresponding to the step size ɧ, and (u,v) corresponding to the weight vector w? Then, what the meaning of compute the optimal (∆u, ∆v)?
