ITITMay 14

A Global Characterization of $f$-Divergences Yielding PSD Mutual-Information Matrices

arXiv:2601.0892916.6
Predicted impact top 75% in IT · last 90 daysOriginality Highly original
AI Analysis

This provides a fundamental theoretical characterization for kernel methods in information theory, clarifying which divergences can define valid kernels over variables.

The authors characterize which f-divergences yield positive semidefinite (PSD) mutual-information matrices for any finite set of random variables. They prove that the matrix is PSD for all finite-alphabet families if and only if the normalized generator has a globally convergent expansion with nonnegative coefficients, explaining why Shannon MI and Jensen-Shannon fail while χ² succeeds.

Given $n$ random variables, when does the matrix of pairwise $f$-mutual informations define a PSD kernel over variables? For convex finite generators $f:(0,\infty)\to\mathbb{R}$ with $f(1)=0$ and finite boundary value $f(0)$, we give a closed characterization up to linear transformation $f\sim f+c(t-1)$, which leaves every $f$-divergence and every $f$-mutual-information matrix unchanged. The matrix $M^{(f)}_{ij}:=I_f(X_i;X_j)$ is PSD for every finite-alphabet family if and only if the normalized representative has a globally convergent expansion $\bar f(t)=\sum_{m\ge2}a_m(t-1)^m$, with $a_m\ge0$, on all of $(0,\infty)$. Sufficiency follows from a replica embedding for monomial generators plus closure under nonnegative mixtures. Necessity first extracts the local Taylor cone at $1$ using biased three-point kernels $H_a$, the Belton--Guillot--Khare--Putinar (BGKP) low-rank Hankel positivity-preserver theorem, and then bootstraps analyticity to the divergence. This is a kernel characterization problem, not a metric one: PSD of the variable-indexed matrix is distinct from Hilbertian properties of divergences between distributions. The result explains why Shannon MI and Jensen--Shannon fail, why $χ^2$ succeeds, and why non-analytic divergences such as total variation and ReLU are excluded.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes