CLSep 5, 2019

Multi-Granularity Self-Attention for Neural Machine Translation

arXiv:1909.02222v11010 citations
Originality Incremental advance
AI Analysis

This addresses the problem of weak structure modeling in NMT for translation tasks, but it is incremental as it builds on existing self-attention methods.

The paper tackled the lack of explicit phrase modeling in neural machine translation by proposing multi-granularity self-attention, which improved performance on WMT14 English-to-German and NIST Chinese-to-English translation tasks.

Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information. However, prior work on statistical machine translation has shown that extending the basic translation unit from words to phrases has produced substantial improvements, suggesting the possibility of improving NMT performance from explicit modeling of phrases. In this work, we present multi-granularity self-attention (Mg-Sa): a neural network that combines multi-head self-attention and phrase modeling. Specifically, we train several attention heads to attend to phrases in either n-gram or syntactic formalism. Moreover, we exploit interactions among phrases to enhance the strength of structure modeling - a commonly-cited weakness of self-attention. Experimental results on WMT14 English-to-German and NIST Chinese-to-English translation tasks show the proposed approach consistently improves performance. Targeted linguistic analysis reveals that Mg-Sa indeed captures useful phrase information at various levels of granularities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes