LFD Book Forum Batch vs. SGD

#1
05-07-2012, 01:41 PM
 kurts Invited Guest Join Date: Apr 2012 Location: Portland, OR Posts: 70
Batch vs. SGD

I've been wondering how exactly SGD is an improvement over "batch" gradient descent.

In batch mode, you go through all the points and then update the weights at the end.

In SGD, you go through all the points and update the weights after each point.

The math says that on average, you end up at the same location, but the SGD takes a more "wiggly" path to get there.

Then, you iterate until the termination condition holds. So, the way it looks like to me, each "epoch" in SGD is really just the same thing as a "step" in batch mode. You do the same amount of computation. How exactly does SGD provide an improvement?
#2
05-07-2012, 01:53 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477
Re: Batch vs. SGD

Quote:
 Originally Posted by kurts I've been wondering how exactly SGD is an improvement over "batch" gradient descent. In batch mode, you go through all the points and then update the weights at the end. In SGD, you go through all the points and update the weights after each point. The math says that on average, you end up at the same location, but the SGD takes a more "wiggly" path to get there. Then, you iterate until the termination condition holds. So, the way it looks like to me, each "epoch" in SGD is really just the same thing as a "step" in batch mode. You do the same amount of computation. How exactly does SGD provide an improvement?
It allows you to move further per example while maintaining the linear approximation. Each example in batch GD has an effective learning rate of , while in SGD it is . Granted that the 's are often different so you don't get a gain factor of , but you do get a gain factor (roughly under idealized assumptions).
__________________
Where everyone thinks alike, no one thinks very much
#3
05-07-2012, 01:58 PM
 kurts Invited Guest Join Date: Apr 2012 Location: Portland, OR Posts: 70
Re: Batch vs. SGD

I think I understand, now. Thanks!
#4
08-14-2012, 09:55 AM
 gah44 Invited Guest Join Date: Jul 2012 Location: Seattle, WA Posts: 153
Re: Batch vs. SGD

Quote:
 Originally Posted by yaser It allows you to move further per example while maintaining the linear approximation. Each example in batch GD has an effective learning rate of , while in SGD it is . Granted that the 's are often different so you don't get a gain factor of , but you do get a gain factor (roughly under idealized assumptions).
Is it also because you use the new values immediately, instead of waiting for the whole batch? There are many problems where a not so obvious comes out, such as random walk.
#5
08-14-2012, 02:46 PM
 yaser Caltech Join Date: Aug 2009 Location: Pasadena, California, USA Posts: 1,477
Re: Batch vs. SGD

Quote:
 Originally Posted by gah44 Is it also because you use the new values immediately, instead of waiting for the whole batch?
Indeed, this is what distinguishes SGD from the batch mode.
__________________
Where everyone thinks alike, no one thinks very much

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home General     General Discussion of Machine Learning     Free Additional Material         Dynamic e-Chapters         Dynamic e-Appendices Course Discussions     Online LFD course         General comments on the course         Homework 1         Homework 2         Homework 3         Homework 4         Homework 5         Homework 6         Homework 7         Homework 8         The Final         Create New Homework Problems Book Feedback - Learning From Data     General comments on the book     Chapter 1 - The Learning Problem     Chapter 2 - Training versus Testing     Chapter 3 - The Linear Model     Chapter 4 - Overfitting     Chapter 5 - Three Learning Principles     e-Chapter 6 - Similarity Based Methods     e-Chapter 7 - Neural Networks     e-Chapter 8 - Support Vector Machines     e-Chapter 9 - Learning Aides     Appendix and Notation     e-Appendices

All times are GMT -7. The time now is 11:34 AM.