View Single Post
Old 07-09-2012, 08:02 PM
magdon's Avatar
magdon magdon is offline
Join Date: Aug 2009
Location: Troy, NY, USA.
Posts: 595
Default Re: How to calculate VC dimension for matrix factorization (Netflix-like) tasks

The VC dimension is a measure that characterises the complexity of a "set system" - a set of sets. A hypothesis set consisting of binary classification functions directly corresponds to a set system - a hypothesis h corresponds to the set of points on which h=+1. The VC dimension measures the diversity of sets in your set system and in the context of binary classification, it has an implication on the generalization ability of the final hypothesis obtained by selecting a "set" that best fits the data.

Now, there is no direct link between the problem of matrix factorization and set systems, so the VC dimension is not yet well defined. So if your task is to observe data and obtain a low rank representation of the user-movie ratings matrix, and you want to know how good your method for factorization is, i.e. how well is it able to recover the true ratings matrix (a form of generalization), there is no immediate way to use the concept of the VC dimension.

We can use the VC dimension if you can formulate a binary classification problem. It seems that when you say "matrix factorization" you are not intersted in the complexity of the matrix factorization problem; you are intersted in the tool "matrix factorization" as a way to predict how a user will rate a movie. To make this problem simpler, say you are interested in whether the user will rate 4 or higher versus lower than 4. We have now constructed a binary classification problem. The next step is to specify the input. In this case, the input is a huge vector x corresponding to a user and containing an entry for every movie (the entry is 0 if the movie was not rated by the user and the actual rating (>0) if it was rated); the prediction task is to populate every 0-entry with +1 or -1 indicating whether the user will rate that movie \ge 4 or not.

Now we must define your hypothesis set, and this is where matrix factorization comes in: You are going to use matrix factorization to compute (say) a set of K "movie factors". Then, you will summarize the user x by constructing the "projection" onto these K factors and then constructing a linear model based on these K factors:

x\longrightarrow z=\phi(x)\longrightarrow h(x)=sign(\tilde w^T z)

The transformation \phi(x) represents the process of projecting onto the K selected factors.

Here is a very important poiont: if the transformation \phi(x) was fixed ahead of time, before seeing the data, then the VC dimension of the model is dimension(z)=K.

Unfortunately, it is exactly in the construction of \phi(x) that your question comes in, because that is where the matrix factorization is going to be used. That is, \phi(x) is not fixed ahead of time before looking at the data; it is constructed after looking at the data using matrix factorization of the (typically sparsely populated) observed ratings matrix. So to compute the VC dimension you need to consider all the possible ways there are of constructing \phi(x) together with the linear model at the end sign(w^T z) - i.e. you have a cascade of \phi(x) with sign(w^T z). This is similar to a 2-layer neural network, and in general there is no easy way to analyze the VC dimension of this cascade without knowing in detail the diversity of possible \phi(x) that can result from the matrix factorization. If you use a linear approach so \phi(x)=w^T x where the weights w are obtained through your matrix factorization algorithm, then this is just a cascade of linear models which is ultimately just a linear model in x,and the VC dimension is at most dimension(x)=m (in your notation). In practice it will be a lot better than this because you are not simultaneously tuning w,\tilde w; i.e. you are learning the factors in an essentially unsupervised manner and fixing them (similar to what was done for RBF learning).

Originally Posted by zzzzz View Post
Let's say we have some users (u) and some movies (m), and some ratings. The variable we control is d, a number of factors for each movie/user.

We would like to know how many data points we need according to VC theory. This number probably depends on d, so in other words, what d we can afford given the data points we have.

So, how do we go about calculating VC dimension here? Is it (u + m) * d?
Have faith in probability
Reply With Quote