LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 5

Reply
 
Thread Tools Display Modes
  #1  
Old 07-28-2012, 08:31 AM
invis invis is offline
Senior Member
 
Join Date: Jul 2012
Posts: 50
Question SGD Movie rating example

I dont understand example from start of the Lecture 10 about movie rating.
Maybe some one can tell me what we suppose to do with errors

How to minimise all errors and get the final hypothesis ?

Tell me where I'm wrong, please:

1)All we need to do is:


2) Will it be the final hypothesis ?

3) Compute gradient error like this (using nu=0.1) ?

where w(0) = our error.


P.S. All what I am trying to do is understand this example from start to exactly end when we have hypothesis function.
Reply With Quote
  #2  
Old 07-28-2012, 01:08 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,478
Default Re: SGD Movie rating example

Quote:
Originally Posted by invis View Post
All what I am trying to do is understand this example from start to exactly end when we have hypothesis function.
The hypothesis in this case is \hat r_{ij} = \sum_{k=1}^K u_{ik}v_{jk} (I am putting a hat to distinguish our estimate of the rating which is the hypothesis from the actual rating which is the target). In terms of our standard notation, g({\bf x})=\hat r_{ij} where the indices ij play role of the input {\bf x}, and the factors u_{ik} , v_{jk} play the role of the parameters of the hypothesis set. Each set of values of these parameters corresponds to a hypothesis h, and the final set of values when SGD terminates corresponds to the final hypothesis g.

Now, a step in SGD modifies u_{ik} , v_{jk} according to the gradient of the error on one example ({\bf e}_{ij} in the slide you quote in your post). You get the partial derivative of {\bf e}_{ij} with respect to each u_{ik} and each v_{jk}, and move along the negative of the gradient.

For each example, there are only 2K of these parameters for which the gradient is non-zero (the rest of the parameters are not involved in \hat r_{ij} so the partial derivative with respect to them is zero). When you go to another example, it will involve other parameters so by the time you have gone through all examples, all the parameters will have been involved.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 07-29-2012, 01:41 AM
invis invis is offline
Senior Member
 
Join Date: Jul 2012
Posts: 50
Default Re: SGD Movie rating example

Thanks for answer, Professor.

But I still dont understand SGD steps. How to compute gradient on 1 example ? How to get the partial derivative of e^{ij} if u_{ik} and v_{jk} is unknown ?
Reply With Quote
  #4  
Old 07-29-2012, 12:54 PM
invis invis is offline
Senior Member
 
Join Date: Jul 2012
Posts: 50
Default Re: SGD Movie rating example

Should I just initialize all U and V to 0 and then start to compute error, then gradient and so on ?
Reply With Quote
  #5  
Old 07-29-2012, 02:05 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,478
Default Re: SGD Movie rating example

Quote:
Originally Posted by invis View Post
Should I just initialize all U and V to 0 and then start to compute error, then gradient and so on ?
In general, Initializing to small random numbers avoids symmetry problems.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #6  
Old 07-29-2012, 02:27 PM
invis invis is offline
Senior Member
 
Join Date: Jul 2012
Posts: 50
Default Re: SGD Movie rating example

Ok, so I initialize all U's and V's to random numbers between 0 and 1 for example. The I compute first error and get 8.76 then how to compute gradient ? Just cant catch the full algorithm.
Reply With Quote
  #7  
Old 07-29-2012, 02:48 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,478
Default Re: SGD Movie rating example

Quote:
Originally Posted by invis View Post
Ok, so I initialize all U's and V's to random numbers between 0 and 1 for example. The I compute first error and get 8.76 then how to compute gradient ? Just cant catch the full algorithm.
All you need is compute the partial derivatives in order to get the gradient. Just think of the factors as variables that you are differentiating with respect to.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #8  
Old 07-29-2012, 10:50 PM
invis invis is offline
Senior Member
 
Join Date: Jul 2012
Posts: 50
Default Re: SGD Movie rating example

Exuse me for annoying please, but to be sure that I understand you right:

\triangledown e_{i,j} = (r_{ij}^2 - 2*\sum_{k=1}^K v_{jk} + \sum_{k=1}^K 2*u_{ik}*v_{jk}^2) + (r_{ij}^2 - 2*\sum_{k=1}^K u_{ik} + \sum_{k=1}^K 2*v_{jk}*u_{jk}^2)

So after computing \triangledown e_{i,j}. We are doing this steps:

1) u_{ik} = u_{ik} - \eta *\sum_{j=1}^J \triangledown e_{i,j}
2) v_{jk} = v_{jk} - \eta *\sum_{i=1}^I \triangledown e_{i,j}
3) repeat computing \triangledown e_{i,j} and 1-2 steps for all data that we have

Looks strange, is'nt it ?
Reply With Quote
  #9  
Old 07-29-2012, 11:27 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,478
Default Re: SGD Movie rating example

Quote:
Originally Posted by invis View Post
\triangledown e_{i,j} = (r_{ij}^2 - 2*\sum_{k=1}^K v_{jk} + \sum_{k=1}^K 2*u_{ik}*v_{jk}^2) + (r_{ij}^2 - 2*\sum_{k=1}^K u_{ik} + \sum_{k=1}^K 2*v_{jk}*u_{jk}^2)
\triangledown e_{i,j} is a vector of 2K partial derivatives, and the formula for each partial derivative is much simpler than the above formula. May I suggest that you refresh the subject of partial derivatives and vectors and revisit this question?
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #10  
Old 07-30-2012, 09:47 AM
invis invis is offline
Senior Member
 
Join Date: Jul 2012
Posts: 50
Default Re: SGD Movie rating example

Is it right:
\triangledown e_{ij} = \frac{\delta e_{ij}}{\delta u_{ik}}\hat u_i + \frac{\delta e_{ij}}{\delta v_{jk}} \hat v_j
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 12:05 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.