MLLGMEOct 17, 2024

Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space

HarvardMIT
arXiv:2410.13112v23 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses the problem of imputing distributions in incomplete matrices for applications like earnings prediction, representing an incremental extension of traditional matrix completion to distributional data.

The paper tackles the problem of distributional matrix completion by imputing true distributions from sparsely observed empirical distributions, using optimal transport to generalize nearest neighbors to the distributional setting. It demonstrates through simulations that the method provides better distributional estimates than using observed samples alone, with accurate estimates of quantities like standard deviation and value-at-risk, and supports heteroscedastic distributions.

We study the problem of distributional matrix completion: Given a sparsely observed matrix of empirical distributions, we seek to impute the true distributions associated with both observed and unobserved matrix entries. This is a generalization of traditional matrix completion, where the observations per matrix entry are scalar-valued. To do so, we utilize tools from optimal transport to generalize the nearest neighbors method to the distributional setting. Under a suitable latent factor model on probability distributions, we establish that our method recovers the distributions in the Wasserstein metric. We demonstrate through simulations that our method (i) provides better distributional estimates for an entry compared to using observed samples for that entry alone, (ii) yields accurate estimates of distributional quantities such as standard deviation and value-at-risk, and (iii) inherently supports heteroscedastic distributions. In addition, we demonstrate our method on a real-world dataset of quarterly earnings prediction distributions. We also prove novel asymptotic results for Wasserstein barycenters over one-dimensional distributions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes