Probabilistic low-rank matrix completion on finite alphabets
This work addresses matrix completion for discrete data in applications like recommender systems, but it is incremental as it extends existing methods to finite alphabets and non-uniform sampling.
The paper tackles the matrix completion problem for finite-alphabet data, such as ratings or labels, under non-uniform sampling, by analyzing a nuclear-norm penalized estimator and deriving theoretical bounds for Kullback-Leibler divergence, while also proposing an efficient algorithm for high-dimensional settings.
The task of reconstructing a matrix given a sample of observedentries is known as the matrix completion problem. It arises ina wide range of problems, including recommender systems, collaborativefiltering, dimensionality reduction, image processing, quantum physics or multi-class classificationto name a few. Most works have focused on recovering an unknown real-valued low-rankmatrix from randomly sub-sampling its entries.Here, we investigate the case where the observations take a finite number of values, corresponding for examples to ratings in recommender systems or labels in multi-class classification.We also consider a general sampling scheme (not necessarily uniform) over the matrix entries.The performance of a nuclear-norm penalized estimator is analyzed theoretically.More precisely, we derive bounds for the Kullback-Leibler divergence between the true and estimated distributions.In practice, we have also proposed an efficient algorithm based on lifted coordinate gradient descent in order to tacklepotentially high dimensional settings.