CLNov 2, 2018

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

arXiv:1811.01088v2484 citations
Originality Incremental advance
AI Analysis

This work addresses performance enhancement for language understanding tasks, particularly in data-constrained regimes, though it is incremental as it builds on existing models like BERT and ELMo.

The paper tackles improving sentence encoders by supplementing unsupervised pretraining with supervised tasks like natural language inference, resulting in a state-of-the-art GLUE score of 81.8, a 1.4-point improvement over BERT, and reduced variance across random restarts.

Pretraining sentence encoders with language modeling and related unsupervised tasks has recently been shown to be very effective for language understanding tasks. By supplementing language model-style pretraining with further training on data-rich supervised tasks, such as natural language inference, we obtain additional performance improvements on the GLUE benchmark. Applying supplementary training on BERT (Devlin et al., 2018), we attain a GLUE score of 81.8---the state of the art (as of 02/24/2019) and a 1.4 point improvement over BERT. We also observe reduced variance across random restarts in this setting. Our approach yields similar improvements when applied to ELMo (Peters et al., 2018a) and Radford et al. (2018)'s model. In addition, the benefits of supplementary training are particularly pronounced in data-constrained regimes, as we show in experiments with artificially limited training data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes