LG MLNov 17, 2025

Generalization Bounds for Semi-supervised Matrix Completion with Distributional Side Information

Antoine Ledent, Mun Chong Soo, Nong Minh Hieu

arXiv:2511.13049v14.1h-index: 7

Originality Incremental advance

AI Analysis

This addresses the challenge of leveraging both explicit and implicit feedback in recommender systems, though it is incremental as it builds on existing low-rank subspace recovery and matrix completion theories.

The paper tackles the problem of matrix completion in semi-supervised settings where ground truth and sampling distributions share a low-rank subspace, using unlabeled implicit feedback and labeled explicit feedback. It derives error bounds scaling as Õ(√(nd/M)) and Õ(√(dr/N)), and shows improved performance on real datasets like Douban and MovieLens compared to baselines using only explicit ratings.

We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices, and \textit{share a common subspace}. We assume that a large amount $M$ of \textit{unlabeled} data drawn from the sampling distribution $P$ is available, together with a small amount $N$ of labeled data drawn from the same distribution and noisy estimates of the corresponding ground truth entries. This setting is inspired by recommender systems scenarios where the unlabeled data corresponds to `implicit feedback' (consisting in interactions such as purchase, click, etc. ) and the labeled data corresponds to the `explicit feedback', consisting of interactions where the user has given an explicit rating to the item. Leveraging powerful results from the theory of low-rank subspace recovery, together with classic generalization bounds for matrix completion models, we show error bounds consisting of a sum of two error terms scaling as $\widetilde{O}\left(\sqrt{\frac{nd}{M}}\right)$ and $\widetilde{O}\left(\sqrt{\frac{dr}{N}}\right)$ respectively, where $d$ is the rank of $P$ and $r$ is the rank of $M$. In synthetic experiments, we confirm that the true generalization error naturally splits into independent error terms corresponding to the estimations of $P$ and and the ground truth matrix $\ground$ respectively. In real-life experiments on Douban and MovieLens with most explicit ratings removed, we demonstrate that the method can outperform baselines relying only on the explicit ratings, demonstrating that our assumptions provide a valid toy theoretical setting to study the interaction between explicit and implicit feedbacks in recommender systems.

View on arXiv PDF

Similar