Quote:
Originally Posted by bargava
I didn't entirely understand what coordinate descent meant. This is what I believe it to be: Instead of descending "simultaneously" along all the coordinates as in gradient descent(in this eg: both u and v), we first descend along u, find the new u and then find v. So, when computing v, the new value of u is to be used. Am I right?

Correct. After each update along one coordinate, you compute the derivative at the new point, then descend along the other coordinate. This is not an efficient method, and is meant for comparison with gradient descent.