The VC dimension is a measure that characterises the complexity of a "set system"  a set of sets. A hypothesis set consisting of binary classification functions directly corresponds to a set system  a hypothesis
corresponds to the set of points on which
. The VC dimension measures the diversity of sets in your set system and in the context of binary classification, it has an implication on the generalization ability of the final hypothesis obtained by selecting a "set" that best fits the data.
Now, there is no direct link between the problem of matrix factorization and set systems, so the VC dimension is not yet well defined. So if your task is to observe data and obtain a low rank representation of the usermovie ratings matrix, and you want to know how good your method for factorization is, i.e. how well is it able to recover the true ratings matrix (a form of generalization), there is no immediate way to use the concept of the VC dimension.
We can use the VC dimension if you can formulate a binary classification problem. It seems that when you say "matrix factorization" you are not intersted in the complexity of the matrix factorization problem; you are intersted in the tool "matrix factorization" as a way to predict how a user will rate a movie. To make this problem simpler, say you are interested in whether the user will rate 4 or higher versus lower than 4. We have now constructed a binary classification problem. The next step is to specify the input. In this case, the input is a huge vector
corresponding to a user and containing an entry for every movie (the entry is 0 if the movie was not rated by the user and the actual rating (
) if it was rated); the prediction task is to populate every 0entry with +1 or 1 indicating whether the user will rate that movie
or not.
Now we must define your hypothesis set, and this is where matrix factorization comes in: You are going to use matrix factorization to compute (say) a set of
"movie factors". Then, you will summarize the user
by constructing the "projection" onto these
factors and then constructing a linear model based on these
factors:
The transformation
represents the process of projecting onto the
selected factors.
Here is a very important poiont:
if the transformation was fixed ahead of time, before seeing the data, then the VC dimension of the model is .
Unfortunately, it is exactly in the construction of
that your question comes in, because that is where the matrix factorization is going to be used. That is,
is not fixed ahead of time before looking at the data; it is constructed
after looking at the data using matrix factorization of the (typically sparsely populated) observed ratings matrix. So to compute the VC dimension you need to consider all the possible ways there are of constructing
together with the linear model at the end
 i.e. you have a cascade of
with
. This is similar to a 2layer neural network, and in general there is no easy way to analyze the VC dimension of this cascade without knowing in detail the diversity of possible
that can result from the matrix factorization. If you use a linear approach so
where the weights
are obtained through your matrix factorization algorithm, then this is just a cascade of linear models which is ultimately just a linear model in
,and the VC dimension is
at most (in your notation). In practice it will be a lot better than this because you are not simultaneously tuning
; i.e. you are learning the factors in an essentially unsupervised manner and fixing them (similar to what was done for RBF learning).
Quote:
Originally Posted by zzzzz
Let's say we have some users (u) and some movies (m), and some ratings. The variable we control is d, a number of factors for each movie/user.
We would like to know how many data points we need according to VC theory. This number probably depends on d, so in other words, what d we can afford given the data points we have.
So, how do we go about calculating VC dimension here? Is it (u + m) * d?
