![]() |
#1
|
|||
|
|||
![]()
I didn't entirely understand what co-ordinate descent meant. This is what I believe it to be: Instead of descending "simultaneously" along all the co-ordinates as in gradient descent(in this eg: both u and v), we first descend along u, find the new u and then find v. So, when computing v, the new value of u is to be used. Am I right?
|
#2
|
||||
|
||||
![]() Quote:
__________________
Where everyone thinks alike, no one thinks very much |
#3
|
|||
|
|||
![]()
I am struggling to understand what I did wrong on this question.
The instructions are clear, I followed the method above (I think?), my answers to related questions (5 and 6) were correct, but my answer to question 7 is far far less than the correct answer. I got the answer level of accuracy in only 5 iterations (instead of 15), so I must have a serious problem with my algorithm. I am wondering if I understand the term "only to reduce error". I took this to mean that after each step I recalculate the error, and if the error increased I do not apply the update. This helped rapid convergence significantly. Upon researching why I got this answer wrong, I ran across some conflicting references that suggest "coordinate descent" can be much more efficient algorithm than GD because of some tricks to re-use parts of the calculation. I'm not sure what to think. |
#4
|
||||
|
||||
![]()
I see where the misunderstanding is. The word 'only' is meant to qualify the previous part: move along the u coordinate only to reduce the error.' Having said that, evaluating the error then undoing the step is not indicated given the part that follows: '(assume first-order approximation holds like in gradient descent).'
__________________
Where everyone thinks alike, no one thinks very much |
![]() |
Thread Tools | |
Display Modes | |
|
|