Sample completion, structured correlation, and Netflix problems
This provides a theoretical foundation for understanding algorithm success in recommendation systems like Netflix, though it appears incremental as it builds on existing classification theory.
The authors tackled the problem of learning from high-dimensional data with structured correlation and randomness, completely characterizing learnability in terms of VCN_{k,k}-dimension and linking it to algorithms from the 2006 Netflix Prize.
We develop a new high-dimensional statistical learning model which can take advantage of structured correlation in data even in the presence of randomness. We completely characterize learnability in this model in terms of VCN${}_{k,k}$-dimension (essentially $k$-dependence from Shelah's classification theory). This model suggests a theoretical explanation for the success of certain algorithms in the 2006~Netflix Prize competition.