LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 5

Reply
 
Thread Tools Display Modes
  #1  
Old 05-04-2013, 06:03 PM
marek marek is offline
Member
 
Join Date: Apr 2013
Posts: 31
Default Hw5 q8 data permutation

I must be missed something, but I do not understand why we permute the data.

\nabla E_{in} = -\frac{1}{N}\sum_{i=i}^N \frac{y_n x_n}{1+e^{y_nw^{\top}x_n}} treats each data point separately, but then sums them all up. Thus, even if we do permute the data points, in the end it all gets combined together in this sum. What am I overlooking?
Reply With Quote
  #2  
Old 05-04-2013, 07:21 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,477
Default Re: Hw5 q8 data permutation

Quote:
Originally Posted by marek View Post
I must be missed something, but I do not understand why we permute the data.

\nabla E_{in} = -\frac{1}{N}\sum_{i=i}^N \frac{y_n x_n}{1+e^{y_nw^{\top}x_n}} treats each data point separately, but then sums them all up. Thus, even if we do permute the data points, in the end it all gets combined together in this sum. What am I overlooking?
Ture. If we were applying batch mode, permutation would not change anything since the weight update is done at the end of the epoch and takes all the examples into consideration regardless of the order they were presented. In Stochastic gradient descent, however, the update is done after each example, so the order changes the outcome. These permutations ensure that the order is randomized so we get the benefits of randomness that were mentioned briefly in Lecture 9.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 05-04-2013, 08:03 PM
marek marek is offline
Member
 
Join Date: Apr 2013
Posts: 31
Default Re: Hw5 q8 data permutation

Quote:
Originally Posted by yaser View Post
Ture. If we were applying batch mode, permutation would not change anything since the weight update is done at the end of the epoch and takes all the examples into consideration regardless of the order they were presented. In Stochastic gradient descent, however, the update is done after each example, so the order changes the outcome. These permutations ensure that the order is randomized so we get the benefits of randomness that were mentioned briefly in Lecture 9.
I was just about to delete my post as I figured out my error. I missed the "stochastic" part and had not yet watched lecture 10. That's what I get for trying to solve the homework before learning all the material =) Thanks so much for your quick reply
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 04:13 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.