CL AI LGApr 18, 2021

Consistent Accelerated Inference via Confident Adaptive Transformers

Tal Schuster, Adam Fisch, Tommi Jaakkola, Regina Barzilay

arXiv:2104.08803v231.5682 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses computational inefficiency in NLP for users of large Transformers, though it appears incremental as it builds on existing acceleration methods with added guarantees.

The paper tackles the problem of unpredictable performance costs in accelerated inference for large Transformers by introducing Confident Adaptive Transformers (CATs), which dynamically stop computation for each input using a meta consistency classifier and conformal prediction, achieving guaranteed consistency with the original model while increasing efficiency.

We develop a novel approach for confidently accelerating inference in the large and expensive multilayer Transformers that are now ubiquitous in natural language processing (NLP). Amortized or approximate computational methods increase efficiency, but can come with unpredictable performance costs. In this work, we present CATs -- Confident Adaptive Transformers -- in which we simultaneously increase computational efficiency, while guaranteeing a specifiable degree of consistency with the original model with high confidence. Our method trains additional prediction heads on top of intermediate layers, and dynamically decides when to stop allocating computational effort to each input using a meta consistency classifier. To calibrate our early prediction stopping rule, we formulate a unique extension of conformal prediction. We demonstrate the effectiveness of this approach on four classification and regression tasks.

View on arXiv PDF Code

Similar