LFD Book Forum  

Go Back   LFD Book Forum > Course Discussions > Online LFD course > Homework 5

Reply
 
Thread Tools Display Modes
  #1  
Old 02-12-2013, 03:25 PM
ilya239 ilya239 is offline
Senior Member
 
Join Date: Jul 2012
Posts: 58
Question movie ratings

What is the intuition behind the form of the hypothesis function for movie ratings?



I'm trying to understand why it makes sense to multiply user factor by movie factor, to get that factor's contribution to the rating. E.g. if user doesn't like horror movies and the movie has a low "horror movie" rating, multiplying these together gives a low number. Shouldn't the rating be based on the distance/difference between a user's value for a factor and a movie's value for that factor?

I understand that in a learning situation the factors do not have specific interpretations -- there is just a list of factors. Still, the motivation was clearly that there are factors (horror-ness, comedy-ness etc). So what is the motivation behind taking the product of factors instead of some form of their difference?
Reply With Quote
  #2  
Old 02-12-2013, 03:32 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,478
Default Re: movie ratings

Quote:
Originally Posted by ilya239 View Post
I'm trying to understand why it makes sense to multiply user factor by movie factor, to get that factor's contribution to the rating. E.g. if user doesn't like horror movies and the movie has a low "horror movie" rating, multiplying these together gives a low number. Shouldn't the rating be based on the distance/difference between a user's value for a factor and a movie's value for that factor?
An inner product is just a model, motivated by the maximization of such a product (in the case of unit norms) when the two vectors match exactly. Your idea of a distance is another model, also with its own plausibility. The learning algorithm will adjust the values of the parameters for each model so that the error is minimum, and the only objective way of comparing the plausibility of the two models is to compare their out-of-sample performance at that point.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #3  
Old 02-12-2013, 06:22 PM
ilya239 ilya239 is offline
Senior Member
 
Join Date: Jul 2012
Posts: 58
Default Re: movie ratings

Ah, if each vector is normalized to unit length then this makes sense. But, there is no way to constrain the vector component values during gradient descent so that the vectors stay at unit length. Or is it that each vector is normalized every time we compute the dot product? I understand wanting the vectors to point in the same direction, but the vector magnitude seems like a distraction.

I know that model parameters needn't have a human-understandable interpretation (cf. hidden layers of neural networks), but if they do, it helps to see that the intuition makes sense
Reply With Quote
  #4  
Old 02-12-2013, 07:05 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,478
Default Re: movie ratings

Quote:
Originally Posted by ilya239 View Post
Ah, if each vector is normalized to unit length then this makes sense. But, there is no way to constrain the vector component values during gradient descent so that the vectors stay at unit length. Or is it that each vector is normalized every time we compute the dot product? I understand wanting the vectors to point in the same direction, but the vector magnitude seems like a distraction.
The vectors are not normalized, at least not deliberately. The argument was only meant to motivate that the inner product has a matching aspect. However, even if we consider the magnitude to be a distraction, the learning algorithm has the opportunity to keep the magnitude fixed if that helps reduce the error value.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
  #5  
Old 02-12-2013, 07:13 PM
ilya239 ilya239 is offline
Senior Member
 
Join Date: Jul 2012
Posts: 58
Default Re: movie ratings

Got it. I guess the learning algorithm cares most about the number of parameters, and forcing normalization would only reduce that by two.

Point of learning is to not have to guess the target function or even its form, but it's hard to resist micromanaging the process

Thanks!
Reply With Quote
  #6  
Old 02-12-2013, 07:17 PM
yaser's Avatar
yaser yaser is offline
Caltech
 
Join Date: Aug 2009
Location: Pasadena, California, USA
Posts: 1,478
Default Re: movie ratings

Quote:
Originally Posted by ilya239 View Post
Point of learning is to not have to guess the target function or even its form, but it's hard to resist micromanaging the process
Nicely put. Sometimes there is a compelling reason to introduce a particular functional form or constraint, but this is the exception not the rule.
__________________
Where everyone thinks alike, no one thinks very much
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 10:45 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.