ST MLJun 24, 2021

Three rates of convergence or separation via U-statistics in a dependent framework

Quentin Duchemin, Yohann De Castro, Claire Lacour

arXiv:2106.12796v21.2

Originality Incremental advance

AI Analysis

This work addresses theoretical gaps in statistical inference for dependent data, offering incremental advancements in specific domains like MCMC and online learning.

The paper tackles the non-asymptotic analysis of U-statistics for dependent data, specifically applying a new concentration inequality to three problems: establishing an exponential inequality for estimating spectra of trace class integral operators with MCMC methods, analyzing generalization performance of online algorithms with pairwise loss functions and Markov chain samples, and providing a non-asymptotic analysis of a goodness-of-fit test for Markov chain densities.

Despite the ubiquity of U-statistics in modern Probability and Statistics, their non-asymptotic analysis in a dependent framework may have been overlooked. In a recent work, a new concentration inequality for U-statistics of order two for uniformly ergodic Markov chains has been proved. In this paper, we put this theoretical breakthrough into action by pushing further the current state of knowledge in three different active fields of research. First, we establish a new exponential inequality for the estimation of spectra of trace class integral operators with MCMC methods. The novelty is that this result holds for kernels with positive and negative eigenvalues, which is new as far as we know. In addition, we investigate generalization performance of online algorithms working with pairwise loss functions and Markov chain samples. We provide an online-to-batch conversion result by showing how we can extract a low risk hypothesis from the sequence of hypotheses generated by any online learner. We finally give a non-asymptotic analysis of a goodness-of-fit test on the density of the invariant measure of a Markov chain. We identify some classes of alternatives over which our test based on the $L_2$ distance has a prescribed power.

View on arXiv PDF

Similar