

Thread Tools  Display Modes 
#1




How to calculate VC dimension for matrix factorization (Netflixlike) tasks
Let's say we have some users (u) and some movies (m), and some ratings. The variable we control is d, a number of factors for each movie/user.
We would like to know how many data points we need according to VC theory. This number probably depends on d, so in other words, what d we can afford given the data points we have. So, how do we go about calculating VC dimension here? Is it (u + m) * d? 
#2




Re: How to calculate VC dimension for matrix factorization (Netflixlike) tasks
The VC dimension is a measure that characterises the complexity of a "set system"  a set of sets. A hypothesis set consisting of binary classification functions directly corresponds to a set system  a hypothesis corresponds to the set of points on which . The VC dimension measures the diversity of sets in your set system and in the context of binary classification, it has an implication on the generalization ability of the final hypothesis obtained by selecting a "set" that best fits the data.
Now, there is no direct link between the problem of matrix factorization and set systems, so the VC dimension is not yet well defined. So if your task is to observe data and obtain a low rank representation of the usermovie ratings matrix, and you want to know how good your method for factorization is, i.e. how well is it able to recover the true ratings matrix (a form of generalization), there is no immediate way to use the concept of the VC dimension. We can use the VC dimension if you can formulate a binary classification problem. It seems that when you say "matrix factorization" you are not intersted in the complexity of the matrix factorization problem; you are intersted in the tool "matrix factorization" as a way to predict how a user will rate a movie. To make this problem simpler, say you are interested in whether the user will rate 4 or higher versus lower than 4. We have now constructed a binary classification problem. The next step is to specify the input. In this case, the input is a huge vector corresponding to a user and containing an entry for every movie (the entry is 0 if the movie was not rated by the user and the actual rating () if it was rated); the prediction task is to populate every 0entry with +1 or 1 indicating whether the user will rate that movie or not. Now we must define your hypothesis set, and this is where matrix factorization comes in: You are going to use matrix factorization to compute (say) a set of "movie factors". Then, you will summarize the user by constructing the "projection" onto these factors and then constructing a linear model based on these factors: The transformation represents the process of projecting onto the selected factors. Here is a very important poiont: if the transformation was fixed ahead of time, before seeing the data, then the VC dimension of the model is . Unfortunately, it is exactly in the construction of that your question comes in, because that is where the matrix factorization is going to be used. That is, is not fixed ahead of time before looking at the data; it is constructed after looking at the data using matrix factorization of the (typically sparsely populated) observed ratings matrix. So to compute the VC dimension you need to consider all the possible ways there are of constructing together with the linear model at the end  i.e. you have a cascade of with . This is similar to a 2layer neural network, and in general there is no easy way to analyze the VC dimension of this cascade without knowing in detail the diversity of possible that can result from the matrix factorization. If you use a linear approach so where the weights are obtained through your matrix factorization algorithm, then this is just a cascade of linear models which is ultimately just a linear model in ,and the VC dimension is at most (in your notation). In practice it will be a lot better than this because you are not simultaneously tuning ; i.e. you are learning the factors in an essentially unsupervised manner and fixing them (similar to what was done for RBF learning). Quote:
__________________
Have faith in probability 
#3




Re: How to calculate VC dimension for matrix factorization (Netflixlike) tasks
Assume that the input space to be all the entries in the by (binary) matrix, and each hypothesis is simply sign() for two matrices of rank at most . The following paper
http://ttic.uchicago.edu/~nati/Publi...kolaNIPS04.pdf gives a combinatorial upper bound on the growth function, and thus the VC dimension is bounded by some log factors. Hope this helps.
__________________
When one teaches, two learn. 
Tags 
matrix factorization, netflix, vc dimension 
Thread Tools  
Display Modes  

