![]() |
#1
|
|||
|
|||
![]()
I've been wondering how exactly SGD is an improvement over "batch" gradient descent.
In batch mode, you go through all the points and then update the weights at the end. In SGD, you go through all the points and update the weights after each point. The math says that on average, you end up at the same location, but the SGD takes a more "wiggly" path to get there. Then, you iterate until the termination condition holds. So, the way it looks like to me, each "epoch" in SGD is really just the same thing as a "step" in batch mode. You do the same amount of computation. How exactly does SGD provide an improvement? |
#2
|
||||
|
||||
![]() Quote:
![]() ![]() ![]() ![]() ![]()
__________________
Where everyone thinks alike, no one thinks very much |
#3
|
|||
|
|||
![]()
I think I understand, now. Thanks!
|
#4
|
|||
|
|||
![]() Quote:
![]() |
#5
|
||||
|
||||
![]()
Indeed, this is what distinguishes SGD from the batch mode.
__________________
Where everyone thinks alike, no one thinks very much |
![]() |
Thread Tools | |
Display Modes | |
|
|