CLAILGApr 18, 2021

Consistent Accelerated Inference via Confident Adaptive Transformers

arXiv:2104.08803v2682 citations
Originality Incremental advance
AI Analysis

This addresses computational inefficiency in NLP for users of large Transformers, though it appears incremental as it builds on existing acceleration methods with added guarantees.

The paper tackles the problem of unpredictable performance costs in accelerated inference for large Transformers by introducing Confident Adaptive Transformers (CATs), which dynamically stop computation for each input using a meta consistency classifier and conformal prediction, achieving guaranteed consistency with the original model while increasing efficiency.

We develop a novel approach for confidently accelerating inference in the large and expensive multilayer Transformers that are now ubiquitous in natural language processing (NLP). Amortized or approximate computational methods increase efficiency, but can come with unpredictable performance costs. In this work, we present CATs -- Confident Adaptive Transformers -- in which we simultaneously increase computational efficiency, while guaranteeing a specifiable degree of consistency with the original model with high confidence. Our method trains additional prediction heads on top of intermediate layers, and dynamically decides when to stop allocating computational effort to each input using a meta consistency classifier. To calibrate our early prediction stopping rule, we formulate a unique extension of conformal prediction. We demonstrate the effectiveness of this approach on four classification and regression tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes