![]() |
|
![]() |
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
![]()
Let's say we have some users (u) and some movies (m), and some ratings. The variable we control is d, a number of factors for each movie/user.
We would like to know how many data points we need according to VC theory. This number probably depends on d, so in other words, what d we can afford given the data points we have. So, how do we go about calculating VC dimension here? Is it (u + m) * d? |
#2
|
||||
|
||||
![]()
The VC dimension is a measure that characterises the complexity of a "set system" - a set of sets. A hypothesis set consisting of binary classification functions directly corresponds to a set system - a hypothesis
![]() ![]() Now, there is no direct link between the problem of matrix factorization and set systems, so the VC dimension is not yet well defined. So if your task is to observe data and obtain a low rank representation of the user-movie ratings matrix, and you want to know how good your method for factorization is, i.e. how well is it able to recover the true ratings matrix (a form of generalization), there is no immediate way to use the concept of the VC dimension. We can use the VC dimension if you can formulate a binary classification problem. It seems that when you say "matrix factorization" you are not intersted in the complexity of the matrix factorization problem; you are intersted in the tool "matrix factorization" as a way to predict how a user will rate a movie. To make this problem simpler, say you are interested in whether the user will rate 4 or higher versus lower than 4. We have now constructed a binary classification problem. The next step is to specify the input. In this case, the input is a huge vector ![]() ![]() ![]() Now we must define your hypothesis set, and this is where matrix factorization comes in: You are going to use matrix factorization to compute (say) a set of ![]() ![]() ![]() ![]() ![]() The transformation ![]() ![]() Here is a very important poiont: if the transformation ![]() ![]() Unfortunately, it is exactly in the construction of ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Quote:
__________________
Have faith in probability |
#3
|
||||
|
||||
![]()
Assume that the input space to be all the entries in the
![]() ![]() ![]() ![]() http://ttic.uchicago.edu/~nati/Publi...kolaNIPS04.pdf gives a combinatorial upper bound on the growth function, and thus the VC dimension is bounded by ![]()
__________________
When one teaches, two learn. |
![]() |
Tags |
matrix factorization, netflix, vc dimension |
Thread Tools | |
Display Modes | |
|
|