LFD Book Forum Batch vs. SGD

#1
05-07-2012, 01:41 PM
 kurts
Batch vs. SGD

I've been wondering how exactly SGD is an improvement over "batch" gradient descent.

In batch mode, you go through all the points and then update the weights at the end.

In SGD, you go through all the points and update the weights after each point.

The math says that on average, you end up at the same location, but the SGD takes a more "wiggly" path to get there.

Then, you iterate until the termination condition holds. So, the way it looks like to me, each "epoch" in SGD is really just the same thing as a "step" in batch mode. You do the same amount of computation. How exactly does SGD provide an improvement?
#2
05-07-2012, 01:53 PM
 yaser
Re: Batch vs. SGD

It allows you to move further per example while maintaining the linear approximation. Each example in batch GD has an effective learning rate of , while in SGD it is . Granted that the 's are often different so you don't get a gain factor of , but you do get a gain factor (roughly under idealized assumptions).
#3
05-07-2012, 01:58 PM
 kurts
Re: Batch vs. SGD

I think I understand, now. Thanks!
#4
08-14-2012, 09:55 AM
 gah44
Re: Batch vs. SGD

Is it also because you use the new values immediately, instead of waiting for the whole batch? There are many problems where a not so obvious comes out, such as random walk.
#5
08-14-2012, 02:46 PM
 yaser
Re: Batch vs. SGD

Quote:
 Originally Posted by gah44 Is it also because you use the new values immediately, instead of waiting for the whole batch?
Indeed, this is what distinguishes SGD from the batch mode.
