ML LG FA PRApr 10, 2025

Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows

Nicolaj Rux, Michael Quellmalz, Gabriele Steidl

arXiv:2504.07820v21 citationsh-index: 10

Originality Incremental advance

AI Analysis

This work addresses a theoretical limitation in kernel methods for statistics and machine learning, offering a smoothed alternative that enables rigorous analysis in applications like Wasserstein gradient flows, though it is incremental as it modifies an existing kernel rather than introducing a new paradigm.

The paper tackled the problem of non-smoothness in negative distance kernels used in maximum mean discrepancies (MMDs), which hindered theoretical guarantees for Wasserstein gradient flows, by proposing a new Lipschitz differentiable kernel that maintains favorable properties like conditional positive definiteness and simple slicing structure. Numerical results showed the new kernel performs similarly well in gradient descent methods while providing theoretical assurances.

Negative distance kernels $K(x,y) := - \|x-y\|$ were used in the definition of maximum mean discrepancies (MMDs) in statistics and lead to favorable numerical results in various applications. In particular, so-called slicing techniques for handling high-dimensional kernel summations profit from the simple parameter-free structure of the distance kernel. However, due to its non-smoothness in $x=y$, most of the classical theoretical results, e.g. on Wasserstein gradient flows of the corresponding MMD functional do not longer hold true. In this paper, we propose a new kernel which keeps the favorable properties of the negative distance kernel as being conditionally positive definite of order one with a nearly linear increase towards infinity and a simple slicing structure, but is Lipschitz differentiable now. Our construction is based on a simple 1D smoothing procedure of the absolute value function followed by a Riemann-Liouville fractional integral transform. Numerical results demonstrate that the new kernel performs similarly well as the negative distance kernel in gradient descent methods, but now with theoretical guarantees.

View on arXiv PDF

Similar