LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 5

Reply
 
Thread Tools Display Modes
  #1  
Old 05-07-2012, 01:41 PM
kurts kurts is offline
Invited Guest
 
Join Date: Apr 2012
Location: Portland, OR
Posts: 70
Default Batch vs. SGD

I've been wondering how exactly SGD is an improvement over "batch" gradient descent.

In batch mode, you go through all the points and then update the weights at the end.

In SGD, you go through all the points and update the weights after each point.

The math says that on average, you end up at the same location, but the SGD takes a more "wiggly" path to get there.

Then, you iterate until the termination condition holds. So, the way it looks like to me, each "epoch" in SGD is really just the same thing as a "step" in batch mode. You do the same amount of computation. How exactly does SGD provide an improvement?
Reply With Quote
  #2  
Old 05-07-2012, 01:53 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Batch vs. SGD

Quote:
Originally Posted by kurts View Post
I've been wondering how exactly SGD is an improvement over "batch" gradient descent.

In batch mode, you go through all the points and then update the weights at the end.

In SGD, you go through all the points and update the weights after each point.

The math says that on average, you end up at the same location, but the SGD takes a more "wiggly" path to get there.

Then, you iterate until the termination condition holds. So, the way it looks like to me, each "epoch" in SGD is really just the same thing as a "step" in batch mode. You do the same amount of computation. How exactly does SGD provide an improvement?
It allows you to move further per example while maintaining the linear approximation. Each example in batch GD has an effective learning rate of \eta \over N, while in SGD it is \eta. Granted that the \eta's are often different so you don't get a gain factor of N, but you do get a gain factor (roughly \sqrt{N} under idealized assumptions).
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 05-07-2012, 01:58 PM
kurts kurts is offline
Invited Guest
 
Join Date: Apr 2012
Location: Portland, OR
Posts: 70
Default Re: Batch vs. SGD

I think I understand, now. Thanks!
Reply With Quote
  #4  
Old 08-14-2012, 09:55 AM
gah44 gah44 is offline
Invited Guest
 
Join Date: Jul 2012
Location: Seattle, WA
Posts: 153
Default Re: Batch vs. SGD

Quote:
Originally Posted by yaser View Post
It allows you to move further per example while maintaining the linear approximation. Each example in batch GD has an effective learning rate of \eta \over N, while in SGD it is \eta. Granted that the \eta's are often different so you don't get a gain factor of N, but you do get a gain factor (roughly \sqrt{N} under idealized assumptions).
Is it also because you use the new values immediately, instead of waiting for the whole batch? There are many problems where a not so obvious \sqrt{N} comes out, such as random walk.
Reply With Quote
  #5  
Old 08-14-2012, 02:46 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Batch vs. SGD

Quote:
Originally Posted by gah44 View Post
Is it also because you use the new values immediately, instead of waiting for the whole batch?
Indeed, this is what distinguishes SGD from the batch mode.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 12:05 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.