MLLGPRMay 9

Measuring and Decomposing Mode Separation via the Canonical Diffusion

arXiv:2605.0877713.1
Predicted impact top 81% in ML · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners analyzing complex high-dimensional distributions, this provides a principled way to quantify fragmentation that existing tools like entropy or PCA miss.

The paper introduces a method to measure mode separation in high-dimensional densities using a reversible diffusion process, deriving two readouts (SSA and DA) that capture barrier-sensitive structure. Applied to synthetic mixtures, text-to-image models, and molecular dynamics, SSA tracks mutual information and DA recovers known slow degrees of freedom.

Mode separation, namely how sharply a distribution fragments into barrier-separated clusters, is a fundamental geometric property of densities, difficult to quantify in high dimensions. It is structurally distinct from dispersion, yet existing tools fall short: differential entropy rises with spread regardless of fragmentation, PCA orders directions by variance regardless of barriers, and mutual information requires a mixture decomposition one usually does not have. We measure mode separation through a single stochastic process intrinsic to the density: a unique reversible diffusion with $f$ as its stationary distribution and constant scalar diffusion coefficient. We extract two readouts from its autocovariance matrix: SSA (Sum of Squared Autocorrelations), a scalar barrier-sensitive measure; and DA (Dominant Autocorrelation directions), linear projections ordered by metastability rather than variance. Under an isotropic-Gaussian null, we derive a closed-form spectrum for the empirical autocovariance that generalizes Marchenko--Pastur, with an analytic upper edge that selects the lag at which DA is read off. Both readouts use only samples and a score function, scaling to high dimensions through pretrained score-based generative models via Tweedie's identity. We apply our framework to three settings: (i) synthetic Gaussian mixtures, where SSA tracks mutual information; (ii) SDXL text-to-image generations, where SSA and DA capture structure that entropy and PCA miss; and (iii) molecular dynamics of alanine dipeptide, where DA recovers the known slow backbone dihedrals from static samples alone.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes