CLSep 5, 2019

Multi-Granularity Self-Attention for Neural Machine Translation

Jie Hao, Xing Wang, Shuming Shi, Jinfeng Zhang, Zhaopeng Tu

arXiv:1909.02222v130.21010 citations

Originality Incremental advance

AI Analysis

This addresses the problem of weak structure modeling in NMT for translation tasks, but it is incremental as it builds on existing self-attention methods.

The paper tackled the lack of explicit phrase modeling in neural machine translation by proposing multi-granularity self-attention, which improved performance on WMT14 English-to-German and NIST Chinese-to-English translation tasks.

Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information. However, prior work on statistical machine translation has shown that extending the basic translation unit from words to phrases has produced substantial improvements, suggesting the possibility of improving NMT performance from explicit modeling of phrases. In this work, we present multi-granularity self-attention (Mg-Sa): a neural network that combines multi-head self-attention and phrase modeling. Specifically, we train several attention heads to attend to phrases in either n-gram or syntactic formalism. Moreover, we exploit interactions among phrases to enhance the strength of structure modeling - a commonly-cited weakness of self-attention. Experimental results on WMT14 English-to-German and NIST Chinese-to-English translation tasks show the proposed approach consistently improves performance. Targeted linguistic analysis reveals that Mg-Sa indeed captures useful phrase information at various levels of granularities.

View on arXiv PDF

Similar