LGSTJun 2, 2021

Statistical optimality conditions for compressive ensembles

arXiv:2106.01092v12 citations
Originality Incremental advance
AI Analysis

This work provides theoretical guarantees for compressive ensembles, addressing the problem of high-dimensional data analysis for researchers in machine learning theory, though it is incremental as it builds on existing compression frameworks.

The paper tackles the theoretical analysis of ensembles trained on compressed high-dimensional data, establishing distribution-dependent upper bounds on excess risk and showing that compressive classification and regression achieve minimax-optimal rates under mild geometric conditions.

We present a framework for the theoretical analysis of ensembles of low-complexity empirical risk minimisers trained on independent random compressions of high-dimensional data. First we introduce a general distribution-dependent upper-bound on the excess risk, framed in terms of a natural notion of compressibility. This bound is independent of the dimension of the original data representation, and explains the in-built regularisation effect of the compressive approach. We then instantiate this general bound to classification and regression tasks, considering Johnson-Lindenstrauss mappings as the compression scheme. For each of these tasks, our strategy is to develop a tight upper bound on the compressibility function, and by doing so we discover distributional conditions of geometric nature under which the compressive algorithm attains minimax-optimal rates up to at most poly-logarithmic factors. In the case of compressive classification, this is achieved with a mild geometric margin condition along with a flexible moment condition that is significantly more general than the assumption of bounded domain. In the case of regression with strongly convex smooth loss functions we find that compressive regression is capable of exploiting spectral decay with near-optimal guarantees. In addition, a key ingredient for our central upper bound is a high probability uniform upper bound on the integrated deviation of dependent empirical processes, which may be of independent interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes