LFD Book Forum  

Go Back   LFD Book Forum > General > General Discussion of Machine Learning

Reply
 
Thread Tools Display Modes
  #1  
Old 07-09-2012, 05:36 AM
zzzzz zzzzz is offline
Junior Member
 
Join Date: Apr 2012
Posts: 4
Smile How to calculate VC dimension for matrix factorization (Netflix-like) tasks

Let's say we have some users (u) and some movies (m), and some ratings. The variable we control is d, a number of factors for each movie/user.

We would like to know how many data points we need according to VC theory. This number probably depends on d, so in other words, what d we can afford given the data points we have.

So, how do we go about calculating VC dimension here? Is it (u + m) * d?
Reply With Quote
  #2  
Old 07-09-2012, 07:02 PM
magdon's Avatar
magdon magdon is offline
RPI
 
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: How to calculate VC dimension for matrix factorization (Netflix-like) tasks

The VC dimension is a measure that characterises the complexity of a "set system" - a set of sets. A hypothesis set consisting of binary classification functions directly corresponds to a set system - a hypothesis h corresponds to the set of points on which h=+1. The VC dimension measures the diversity of sets in your set system and in the context of binary classification, it has an implication on the generalization ability of the final hypothesis obtained by selecting a "set" that best fits the data.

Now, there is no direct link between the problem of matrix factorization and set systems, so the VC dimension is not yet well defined. So if your task is to observe data and obtain a low rank representation of the user-movie ratings matrix, and you want to know how good your method for factorization is, i.e. how well is it able to recover the true ratings matrix (a form of generalization), there is no immediate way to use the concept of the VC dimension.

We can use the VC dimension if you can formulate a binary classification problem. It seems that when you say "matrix factorization" you are not intersted in the complexity of the matrix factorization problem; you are intersted in the tool "matrix factorization" as a way to predict how a user will rate a movie. To make this problem simpler, say you are interested in whether the user will rate 4 or higher versus lower than 4. We have now constructed a binary classification problem. The next step is to specify the input. In this case, the input is a huge vector x corresponding to a user and containing an entry for every movie (the entry is 0 if the movie was not rated by the user and the actual rating (>0) if it was rated); the prediction task is to populate every 0-entry with +1 or -1 indicating whether the user will rate that movie \ge 4 or not.

Now we must define your hypothesis set, and this is where matrix factorization comes in: You are going to use matrix factorization to compute (say) a set of K "movie factors". Then, you will summarize the user x by constructing the "projection" onto these K factors and then constructing a linear model based on these K factors:

x\longrightarrow z=\phi(x)\longrightarrow h(x)=sign(\tilde w^T z)

The transformation \phi(x) represents the process of projecting onto the K selected factors.

Here is a very important poiont: if the transformation \phi(x) was fixed ahead of time, before seeing the data, then the VC dimension of the model is dimension(z)=K.

Unfortunately, it is exactly in the construction of \phi(x) that your question comes in, because that is where the matrix factorization is going to be used. That is, \phi(x) is not fixed ahead of time before looking at the data; it is constructed after looking at the data using matrix factorization of the (typically sparsely populated) observed ratings matrix. So to compute the VC dimension you need to consider all the possible ways there are of constructing \phi(x) together with the linear model at the end sign(w^T z) - i.e. you have a cascade of \phi(x) with sign(w^T z). This is similar to a 2-layer neural network, and in general there is no easy way to analyze the VC dimension of this cascade without knowing in detail the diversity of possible \phi(x) that can result from the matrix factorization. If you use a linear approach so \phi(x)=w^T x where the weights w are obtained through your matrix factorization algorithm, then this is just a cascade of linear models which is ultimately just a linear model in x,and the VC dimension is at most dimension(x)=m (in your notation). In practice it will be a lot better than this because you are not simultaneously tuning w,\tilde w; i.e. you are learning the factors in an essentially unsupervised manner and fixing them (similar to what was done for RBF learning).


Quote:
Originally Posted by zzzzz View Post
Let's say we have some users (u) and some movies (m), and some ratings. The variable we control is d, a number of factors for each movie/user.

We would like to know how many data points we need according to VC theory. This number probably depends on d, so in other words, what d we can afford given the data points we have.

So, how do we go about calculating VC dimension here? Is it (u + m) * d?
__________________
Have faith in probability
Reply With Quote
  #3  
Old 07-11-2012, 01:03 AM
htlin's Avatar
htlin htlin is offline
NTU
 
Join Date: Aug 2009
Location: Taipei, Taiwan
Posts: 601
Default Re: How to calculate VC dimension for matrix factorization (Netflix-like) tasks

Assume that the input space to be all the entries in the M by N (binary) matrix, and each hypothesis is simply sign(P * Q) for two matrices of rank at most K. The following paper

http://ttic.uchicago.edu/~nati/Publi...kolaNIPS04.pdf

gives a combinatorial upper bound on the growth function, and thus the VC dimension is bounded by K(M+N)* some log factors. Hope this helps.
__________________
When one teaches, two learn.
Reply With Quote
Reply

Tags
matrix factorization, netflix, vc dimension

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 02:01 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.