CLOct 21, 2022

Syntax-guided Localized Self-attention by Constituency Syntactic Distance

Georgia TechMeta AIMILA
arXiv:2210.11759v1291 citationsh-index: 24
Originality Incremental advance
AI Analysis

This work addresses the sample efficiency and performance issues in machine translation for researchers and practitioners, though it is incremental as it builds on existing Transformer and parser methods.

The authors tackled the problem of Transformers' dependency on large-scale training data for learning syntactic information by proposing a syntax-guided localized self-attention mechanism that incorporates external constituency parser structures. The result was consistent improvements in translation performance across various machine translation datasets, including small and large sizes and different source languages.

Recent works have revealed that Transformers are implicitly learning the syntactic information in its lower layers from data, albeit is highly dependent on the quality and scale of the training data. However, learning syntactic information from data is not necessary if we can leverage an external syntactic parser, which provides better parsing quality with well-defined syntactic structures. This could potentially improve Transformer's performance and sample efficiency. In this work, we propose a syntax-guided localized self-attention for Transformer that allows directly incorporating grammar structures from an external constituency parser. It prohibits the attention mechanism to overweight the grammatically distant tokens over close ones. Experimental results show that our model could consistently improve translation performance on a variety of machine translation datasets, ranging from small to large dataset sizes, and with different source languages.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes