ML LGFeb 7, 2025

Scalable and consistent embedding of probability measures into Hilbert spaces via measure quantization

Erell Gachon, Elsa Cazelles, Jérémie Bigot

arXiv:2502.04907v37.81 citationsh-index: 9

Originality Incremental advance

AI Analysis

This work addresses scalability issues for researchers and practitioners using probability measure embeddings in machine learning, offering an incremental improvement over existing methods.

The paper tackles the computational bottleneck of embedding probability measures into Hilbert spaces for large-scale statistical learning by proposing two quantization-based approximation methods, achieving scalable embeddings with proven consistency and low computational cost.

This paper is focused on statistical learning from data that come as probability measures. In this setting, popular approaches consist in embedding such data into a Hilbert space with either Linearized Optimal Transport or Kernel Mean Embedding. However, the cost of computing such embeddings prohibits their direct use in large-scale settings. We study two methods based on measure quantization for approximating input probability measures with discrete measures of small-support size. The first one is based on optimal quantization of each input measure, while the second one relies on mean-measure quantization. We study the consistency of such approximations, and its implication for scalable embeddings of probability measures into a Hilbert space at a low computational cost. We finally illustrate our findings with various numerical experiments.

View on arXiv PDF

Similar