A Note on Optimizing Distributions using Kernel Mean Embeddings
This addresses a limitation in applying kernel mean embeddings to distribution optimization for researchers in machine learning, though it appears incremental as it builds on existing parameterization methods.
The paper tackles the problem of optimizing distributions using kernel mean embeddings by proposing a kernel sum-of-squares parameterization to fit distributions in the MMD geometry, showing denseness and providing algorithms, with results illustrated in a density fitting experiment.
Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. When the kernel is characteristic, mean embeddings can be used to define a distance between probability measures, known as the maximum mean discrepancy (MMD). A well-known advantage of mean embeddings and MMD is their low computational cost and low sample complexity. However, kernel mean embeddings have had limited applications to problems that consist in optimizing distributions, due to the difficulty of characterizing which Hilbert space vectors correspond to a probability distribution. In this note, we propose to leverage the kernel sums-of-squares parameterization of positive functions of Marteau-Ferey et al. [2020] to fit distributions in the MMD geometry. First, we show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense. Then, we provide algorithms to optimize such distributions in the finite-sample setting, which we illustrate in a density fitting numerical experiment.