CLDec 6, 2017

Multi-channel Encoder for Neural Machine Translation

Hao Xiong, Zhongjun He, Xiaoguang Hu, Hua Wu

arXiv:1712.02109v15.034 citationsh-index: 53

Originality Highly original

AI Analysis

This work addresses a bottleneck in neural machine translation for improving translation quality by enabling more flexible source encoding.

The paper tackles the problem of uniform source sentence composition in neural machine translation by proposing a Multi-channel Encoder (MCE) that enhances encoding with different levels of composition, resulting in improvements of 6.52 BLEU points on Chinese-English translation and achieving BLEU=38.8 on WMT14 English-French, comparable to state-of-the-art deep models.

Attention-based Encoder-Decoder has the effective architecture for neural machine translation (NMT), which typically relies on recurrent neural networks (RNN) to build the blocks that will be lately called by attentive reader during the decoding process. This design of encoder yields relatively uniform composition on source sentence, despite the gating mechanism employed in encoding RNN. On the other hand, we often hope the decoder to take pieces of source sentence at varying levels suiting its own linguistic structure: for example, we may want to take the entity name in its raw form while taking an idiom as a perfectly composed unit. Motivated by this demand, we propose Multi-channel Encoder (MCE), which enhances encoding components with different levels of composition. More specifically, in addition to the hidden state of encoding RNN, MCE takes 1) the original word embedding for raw encoding with no composition, and 2) a particular design of external memory in Neural Turing Machine (NTM) for more complex composition, while all three encoding strategies are properly blended during decoding. Empirical study on Chinese-English translation shows that our model can improve by 6.52 BLEU points upon a strong open source NMT system: DL4MT1. On the WMT14 English- French task, our single shallow system achieves BLEU=38.8, comparable with the state-of-the-art deep models.

View on arXiv PDF

Similar