CLDec 30, 2020

Improving BERT with Syntax-aware Local Attention

arXiv:2012.15150v2721 citations
AI Analysis

This work offers an incremental improvement to BERT's attention mechanism for NLP practitioners working on single-sentence tasks, by making attention more syntactically focused.

This paper addresses the limitation of BERT's attention mechanism by introducing syntax-aware local attention, which restricts attention scope based on syntactic structure distances. This modification consistently improves performance over standard BERT on various single-sentence benchmarks, including sentence classification and sequence labeling tasks.

Pre-trained Transformer-based neural language models, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions. Most of them restrict the attention scope within a linear span, or confine to certain tasks such as machine translation and question answering. In this paper, we propose a syntax-aware local attention, where the attention scopes are restrained based on the distances in the syntactic structure. The proposed syntax-aware local attention can be integrated with pretrained language models, such as BERT, to render the model to focus on syntactically relevant words. We conduct experiments on various single-sentence benchmarks, including sentence classification and sequence labeling tasks. Experimental results show consistent gains over BERT on all benchmark datasets. The extensive studies verify that our model achieves better performance owing to more focused attention over syntactically relevant words.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes