CLMar 27, 2022

Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection

arXiv:2203.14380v1643 citationsh-index: 28
Originality Incremental advance
AI Analysis

This work addresses the problem of high computational costs in NLP models for researchers and practitioners, offering an incremental improvement over existing token selection heuristics.

The paper tackles the computational inefficiency of Transformer-based language models like BERT by introducing Pyramid-BERT, which uses a core-set based token selection method to reduce sequence length, achieving improved performance on GLUE benchmarks and Long Range Arena datasets with specific gains in efficiency.

Transformer-based language models such as BERT have achieved the state-of-the-art performance on various NLP tasks, but are computationally prohibitive. A recent line of works use various heuristics to successively shorten sequence length while transforming tokens through encoders, in tasks such as classification and ranking that require a single token embedding for prediction. We present a novel solution to this problem, called Pyramid-BERT where we replace previously used heuristics with a {\em core-set} based token selection method justified by theoretical results. The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths. We provide extensive experiments establishing advantages of pyramid BERT over several baselines and existing works on the GLUE benchmarks and Long Range Arena datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes