CLNov 5, 2019

Deepening Hidden Representations from Pre-trained Language Models

arXiv:1911.01940v211 citations
Originality Highly original
AI Analysis

This addresses a bottleneck in natural language understanding for researchers and practitioners by improving model performance through better utilization of pre-trained representations.

The paper tackles the limitation of using only the final layer output from pre-trained language models by proposing HIRE, a method that fuses hidden representations from multiple layers, achieving state-of-the-art performance on the GLUE benchmark.

Transformer-based pre-trained language models have proven to be effective for learning contextualized language representation. However, current approaches only take advantage of the output of the encoder's final layer when fine-tuning the downstream tasks. We argue that only taking single layer's output restricts the power of pre-trained representation. Thus we deepen the representation learned by the model by fusing the hidden representation in terms of an explicit HIdden Representation Extractor (HIRE), which automatically absorbs the complementary representation with respect to the output from the final layer. Utilizing RoBERTa as the backbone encoder, our proposed improvement over the pre-trained models is shown effective on multiple natural language understanding tasks and help our model rival with the state-of-the-art models on the GLUE benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes