CLAISep 13, 2022

Don't Judge a Language Model by Its Last Layer: Contrastive Learning with Layer-Wise Attention Pooling

arXiv:2209.05972v1586 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses the challenge of effectively utilizing multi-layer features in language models for NLP practitioners, though it is incremental as it builds on existing contrastive learning methods.

The paper tackled the problem of deriving sentence representations from pre-trained language models by introducing an attention-based pooling strategy that preserves layer-wise signals, resulting in improved performance on semantic textual similarity and semantic search tasks, with specific gains over base contrastive learned BERT_base and variants.

Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation. Since attributes captured in stacked layers of PLMs are not clearly identified, straightforward approaches such as embedding the last layer are commonly preferred to derive sentence representations from PLMs. This paper introduces the attention-based pooling strategy, which enables the model to preserve layer-wise signals captured in each layer and learn digested linguistic features for downstream tasks. The contrastive learning objective can adapt the layer-wise attention pooling to both unsupervised and supervised manners. It results in regularizing the anisotropic space of pre-trained embeddings and being more uniform. We evaluate our model on standard semantic textual similarity (STS) and semantic search tasks. As a result, our method improved the performance of the base contrastive learned BERT_base and variants.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes