ITITSTTHMar 22

Dual Representation of Minimum Divergence Under Integral Constraints

arXiv:2603.2102718.8h-index: 6
Predicted impact top 53% in IT · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses a foundational problem in statistics and machine learning, with applications in sequential inference, bandit theory, and distributionally robust optimization, though it is incremental as it builds on and extends prior techniques.

The paper tackles the problem of deriving dual representations for minimum divergence under integral constraints, which is crucial for converting information-theoretic lower bounds into tractable algorithms in statistics and probability. The result generalizes existing methods to arbitrary dimensions, a broad class of f-divergences, and general integral constraints, enabling optimal procedures for sequential testing, estimation, and change detection.

Minimum divergence problems under integral constraints appear throughout statistics and probability, including sequential inference, bandit theory, and distributionally robust optimization. In many such settings, dual representations are the key step that convert information-theoretic lower bounds into computationally tractable (and often near-optimal) algorithms. In this paper, we present a general two-stage recipe for deriving dual representations of constrained minimum divergence (in the second argument) for distributions supported on $[0,1]^K$. The first stage derives a dual representation for finitely-supported distributions using classical finite-dimensional convex duality techniques, while the second establishes an abstract interchange argument that lifts this discretized dual to arbitrary distributions. We begin with the simplest case of mean-constrained minimum relative entropy, commonly called $\mathrm{KL}_{\inf}$, and generalize an existing argument from multi-armed bandits literature for $K=1$ to arbitrary dimensions. Our main contribution is to significantly expand the scope of this approach to a broad class of $f$-divergences (beyond relative entropy) and to general integral constraint functionals (beyond the mean constraint). Finally, we illustrate the statistical implications of our results by constructing optimal procedures for sequential testing, estimation, and change detection with observations in $[0,1]^K$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes