View Single Post
  #2  
Old 05-04-2013, 07:21 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,478
Default Re: Hw5 q8 data permutation

Quote:
Originally Posted by marek View Post
I must be missed something, but I do not understand why we permute the data.

\nabla E_{in} = -\frac{1}{N}\sum_{i=i}^N \frac{y_n x_n}{1+e^{y_nw^{\top}x_n}} treats each data point separately, but then sums them all up. Thus, even if we do permute the data points, in the end it all gets combined together in this sum. What am I overlooking?
Ture. If we were applying batch mode, permutation would not change anything since the weight update is done at the end of the epoch and takes all the examples into consideration regardless of the order they were presented. In Stochastic gradient descent, however, the update is done after each example, so the order changes the outcome. These permutations ensure that the order is randomized so we get the benefits of randomness that were mentioned briefly in Lecture 9.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote