#1




Batch vs. SGD
I've been wondering how exactly SGD is an improvement over "batch" gradient descent.
In batch mode, you go through all the points and then update the weights at the end. In SGD, you go through all the points and update the weights after each point. The math says that on average, you end up at the same location, but the SGD takes a more "wiggly" path to get there. Then, you iterate until the termination condition holds. So, the way it looks like to me, each "epoch" in SGD is really just the same thing as a "step" in batch mode. You do the same amount of computation. How exactly does SGD provide an improvement? 
#2




Re: Batch vs. SGD
Quote:
__________________
Where everyone thinks alike, no one thinks very much 
#3




Re: Batch vs. SGD
I think I understand, now. Thanks!

#4




Re: Batch vs. SGD
Quote:

#5




Re: Batch vs. SGD
Indeed, this is what distinguishes SGD from the batch mode.
__________________
Where everyone thinks alike, no one thinks very much 
Thread Tools  
Display Modes  

