Understanding Neural Machine Translation by Simplification: The Case of Encoder-free Models
This provides insights into NMT mechanisms for researchers, but it is incremental as it builds on existing simplification approaches.
The paper tackles understanding neural machine translation by simplifying architectures to encoder-free models, where source representations use word and positional embeddings without an encoder, and finds that attention acts as a feature extractor and embeddings are competitive, but non-contextualized representations cause a big performance drop.
In this paper, we try to understand neural machine translation (NMT) via simplifying NMT architectures and training encoder-free NMT models. In an encoder-free model, the sums of word embeddings and positional embeddings represent the source. The decoder is a standard Transformer or recurrent neural network that directly attends to embeddings via attention mechanisms. Experimental results show (1) that the attention mechanism in encoder-free models acts as a strong feature extractor, (2) that the word embeddings in encoder-free models are competitive to those in conventional models, (3) that non-contextualized source representations lead to a big performance drop, and (4) that encoder-free models have different effects on alignment quality for German-English and Chinese-English.