LFD Book Forum

LFD Book Forum (http://book.caltech.edu/bookforum/index.php)
-   Homework 5 (http://book.caltech.edu/bookforum/forumdisplay.php?f=134)
-   -   movie ratings (http://book.caltech.edu/bookforum/showthread.php?t=3990)

ilya239 02-12-2013 03:25 PM

movie ratings
 
What is the intuition behind the form of the hypothesis function for movie ratings?



I'm trying to understand why it makes sense to multiply user factor by movie factor, to get that factor's contribution to the rating. E.g. if user doesn't like horror movies and the movie has a low "horror movie" rating, multiplying these together gives a low number. Shouldn't the rating be based on the distance/difference between a user's value for a factor and a movie's value for that factor?

I understand that in a learning situation the factors do not have specific interpretations -- there is just a list of factors. Still, the motivation was clearly that there are factors (horror-ness, comedy-ness etc). So what is the motivation behind taking the product of factors instead of some form of their difference?

yaser 02-12-2013 03:32 PM

Re: movie ratings
 
Quote:

Originally Posted by ilya239 (Post 9367)
I'm trying to understand why it makes sense to multiply user factor by movie factor, to get that factor's contribution to the rating. E.g. if user doesn't like horror movies and the movie has a low "horror movie" rating, multiplying these together gives a low number. Shouldn't the rating be based on the distance/difference between a user's value for a factor and a movie's value for that factor?

An inner product is just a model, motivated by the maximization of such a product (in the case of unit norms) when the two vectors match exactly. Your idea of a distance is another model, also with its own plausibility. The learning algorithm will adjust the values of the parameters for each model so that the error is minimum, and the only objective way of comparing the plausibility of the two models is to compare their out-of-sample performance at that point.

ilya239 02-12-2013 06:22 PM

Re: movie ratings
 
Ah, if each vector is normalized to unit length then this makes sense. But, there is no way to constrain the vector component values during gradient descent so that the vectors stay at unit length. Or is it that each vector is normalized every time we compute the dot product? I understand wanting the vectors to point in the same direction, but the vector magnitude seems like a distraction.

I know that model parameters needn't have a human-understandable interpretation (cf. hidden layers of neural networks), but if they do, it helps to see that the intuition makes sense :)

yaser 02-12-2013 07:05 PM

Re: movie ratings
 
Quote:

Originally Posted by ilya239 (Post 9374)
Ah, if each vector is normalized to unit length then this makes sense. But, there is no way to constrain the vector component values during gradient descent so that the vectors stay at unit length. Or is it that each vector is normalized every time we compute the dot product? I understand wanting the vectors to point in the same direction, but the vector magnitude seems like a distraction.

The vectors are not normalized, at least not deliberately. The argument was only meant to motivate that the inner product has a matching aspect. However, even if we consider the magnitude to be a distraction, the learning algorithm has the opportunity to keep the magnitude fixed if that helps reduce the error value.

ilya239 02-12-2013 07:13 PM

Re: movie ratings
 
Got it. I guess the learning algorithm cares most about the number of parameters, and forcing normalization would only reduce that by two.

Point of learning is to not have to guess the target function or even its form, but it's hard to resist micromanaging the process :)

Thanks!

yaser 02-12-2013 07:17 PM

Re: movie ratings
 
Quote:

Originally Posted by ilya239 (Post 9378)
Point of learning is to not have to guess the target function or even its form, but it's hard to resist micromanaging the process

Nicely put. Sometimes there is a compelling reason to introduce a particular functional form or constraint, but this is the exception not the rule.


All times are GMT -7. The time now is 03:18 PM.

Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.