SGD Movie rating example
I dont understand example from start of the Lecture 10 about movie rating.
Maybe some one can tell me what we suppose to do with errors How to minimise all errors and get the final hypothesis ? Tell me where I'm wrong, please: 1)All we need to do is: 2) Will it be the final hypothesis ? 3) Compute gradient error like this (using nu=0.1) ? where w(0) = our error. P.S. All what I am trying to do is understand this example from start to exactly end when we have hypothesis function. 
Re: SGD Movie rating example
Now, a step in SGD modifies according to the gradient of the error on one example ( in the slide you quote in your post). You get the partial derivative of with respect to each and each , and move along the negative of the gradient. For each example, there are only of these parameters for which the gradient is nonzero (the rest of the parameters are not involved in so the partial derivative with respect to them is zero). When you go to another example, it will involve other parameters so by the time you have gone through all examples, all the parameters will have been involved.
Re: SGD Movie rating example
Should I just initialize all U and V to 0 and then start to compute error, then gradient and so on ?

Re: SGD Movie rating example
In general, Initializing to small random numbers avoids symmetry problems.
Re: SGD Movie rating example
Ok, so I initialize all U's and V's to random numbers between 0 and 1 for example. The I compute first error and get 8.76 then how to compute gradient ? Just cant catch the full algorithm.

Re: SGD Movie rating example
All you need is compute the partial derivatives in order to get the gradient. Just think of the factors as variables that you are differentiating with respect to.
Re: SGD Movie rating example
is a vector of partial derivatives, and the formula for each partial derivative is much simpler than the above formula. May I suggest that you refresh the subject of partial derivatives and vectors and revisit this question?
