MLDSLGMar 4

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

arXiv:2603.04635v1
Originality Highly original
AI Analysis

This work offers a significant improvement in sample efficiency for independence testing, benefiting researchers and practitioners in statistical inference by reducing the data requirements for this fundamental task.

This paper addresses the problem of testing independence of distributions, a task that typically requires a sample complexity scaling polynomially with support size. The authors introduce prediction-augmented independence testers that remain robust to prediction quality while significantly improving sample efficiency when predictions are accurate.

Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $ε$-far from all product distributions in total variation distance. In the non-parametric finite-sample regime, this task is notoriously expensive, as the minimax sample complexity scales polynomially with the support size. In this work, we move beyond these worst-case limitations by leveraging the framework of \textit{augmented distribution testing}. We design independence testers that incorporate auxiliary, but potentially untrustworthy, predictive information. Our framework ensures that the tester remains robust, maintaining worst-case validity regardless of the prediction's quality, while significantly improving sample efficiency when the prediction is accurate. Our main contributions include: (i) a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error; (ii) a generalization to the high-dimensional multivariate setting for testing the independence of $d$ random variables; and (iii) matching minimax lower bounds demonstrating that our testers achieve optimal sample complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes