Data independence

I was recently thinking about the Facebook friend suggesting algorithm,
though I think that the problem could also apply to Netflix.

The assumption is that data points are independent, and so contribute equally to the solution.

In the FB case, if I am friends with more than one person in a family, it has a strong tendency to suggest other friends of the family, stronger than it should. (Though FB doesn't necessarily know that they are related.)

In the Netflix case, if someone likes Spiderman 1, Spiderman 2, and Spiderman 3, that really isn't three independent samples. On the other hand, Spiderman 1 and Batman 1 should be considered more independent.

It seems to me that there should be enough in the data to extract some of this dependence.
