MLLGFAOCAug 29, 2024

Statistical and Geometrical properties of regularized Kernel Kullback-Leibler divergence

arXiv:2408.16543v24 citationsh-index: 12
AI Analysis

This work addresses a theoretical gap in kernel-based divergence measures for machine learning practitioners dealing with probability distributions, though it appears incremental as it builds directly on prior work.

The paper tackles the limitation of the kernel Kullback-Leibler (KKL) divergence being undefined for distributions with disjoint supports by proposing a regularized variant that ensures it is well-defined for all distributions, and provides bounds, a closed-form expression for finite sets, and a Wasserstein gradient descent scheme for discrete distributions.

In this paper, we study the statistical and geometrical properties of the Kullback-Leibler divergence with kernel covariance operators (KKL) introduced by Bach [2022]. Unlike the classical Kullback-Leibler (KL) divergence that involves density ratios, the KKL compares probability distributions through covariance operators (embeddings) in a reproducible kernel Hilbert space (RKHS), and compute the Kullback-Leibler quantum divergence. This novel divergence hence shares parallel but different aspects with both the standard Kullback-Leibler between probability distributions and kernel embeddings metrics such as the maximum mean discrepancy. A limitation faced with the original KKL divergence is its inability to be defined for distributions with disjoint supports. To solve this problem, we propose in this paper a regularised variant that guarantees that the divergence is well defined for all distributions. We derive bounds that quantify the deviation of the regularised KKL to the original one, as well as finite-sample bounds. In addition, we provide a closed-form expression for the regularised KKL, specifically applicable when the distributions consist of finite sets of points, which makes it implementable. Furthermore, we derive a Wasserstein gradient descent scheme of the KKL divergence in the case of discrete distributions, and study empirically its properties to transport a set of points to a target distribution.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes