LGMLMar 15, 2022

Approximability and Generalisation

arXiv:2203.07989v1h-index: 11
Originality Incremental advance
AI Analysis

This work addresses the challenge of ensuring generalization in model compression for small devices, offering theoretical insights and algorithms that could reduce data requirements, though it appears incremental in extending statistical learning theory to modern compression approaches.

The paper tackles the problem of explaining and guaranteeing good generalization for approximate learning machines, such as quantized or compressed predictors, by studying the role of approximability in learning. It proves upper bounds on generalization, showing that under mild conditions, approximable target concepts can be learned from smaller labeled samples with sufficient unlabeled data, and provides algorithms that ensure good predictors with the same generalization guarantees.

Approximate learning machines have become popular in the era of small devices, including quantised, factorised, hashed, or otherwise compressed predictors, and the quest to explain and guarantee good generalisation abilities for such methods has just begun. In this paper we study the role of approximability in learning, both in the full precision and the approximated settings of the predictor that is learned from the data, through a notion of sensitivity of predictors to the action of the approximation operator at hand. We prove upper bounds on the generalisation of such predictors, yielding the following main findings, for any PAC-learnable class and any given approximation operator. 1) We show that under mild conditions, approximable target concepts are learnable from a smaller labelled sample, provided sufficient unlabelled data. 2) We give algorithms that guarantee a good predictor whose approximation also enjoys the same generalisation guarantees. 3) We highlight natural examples of structure in the class of sensitivities, which reduce, and possibly even eliminate the otherwise abundant requirement of additional unlabelled data, and henceforth shed new light onto what makes one problem instance easier to learn than another. These results embed the scope of modern model compression approaches into the general goal of statistical learning theory, which in return suggests appropriate algorithms through minimising uniform bounds.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes