MLAILGSPPRAPJul 22, 2022

Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis

arXiv:2207.10939v114 citationsh-index: 60
Originality Synthesis-oriented
AI Analysis

This work addresses the theoretical understanding of error rates in ML classification for researchers, offering a framework to compute and test convergence properties, though it appears incremental in applying existing large deviations theory to ML contexts.

The paper tackles the problem of analyzing the error probability convergence rate of machine learning classifiers using large deviations theory, providing mathematical conditions for exponential decay and enabling numerical verification of these conditions on available datasets.

We study the performance -- and specifically the rate at which the error probability converges to zero -- of Machine Learning (ML) classification techniques. Leveraging the theory of large deviations, we provide the mathematical conditions for a ML classifier to exhibit error probabilities that vanish exponentially, say $\sim \exp\left(-n\,I + o(n) \right)$, where $n$ is the number of informative observations available for testing (or another relevant parameter, such as the size of the target in an image) and $I$ is the error rate. Such conditions depend on the Fenchel-Legendre transform of the cumulant-generating function of the Data-Driven Decision Function (D3F, i.e., what is thresholded before the final binary decision is made) learned in the training phase. As such, the D3F and, consequently, the related error rate $I$, depend on the given training set, which is assumed of finite size. Interestingly, these conditions can be verified and tested numerically exploiting the available dataset, or a synthetic dataset, generated according to the available information on the underlying statistical model. In other words, the classification error probability convergence to zero and its rate can be computed on a portion of the dataset available for training. Coherently with the large deviations theory, we can also establish the convergence, for $n$ large enough, of the normalized D3F statistic to a Gaussian distribution. This property is exploited to set a desired asymptotic false alarm probability, which empirically turns out to be accurate even for quite realistic values of $n$. Furthermore, approximate error probability curves $\sim ζ_n \exp\left(-n\,I \right)$ are provided, thanks to the refined asymptotic derivation (often referred to as exact asymptotics), where $ζ_n$ represents the most representative sub-exponential terms of the error probabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes