CLLGJul 19, 2021

Residual Tree Aggregation of Layers for Neural Machine Translation

arXiv:2107.14590v1
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in Neural Machine Translation for improving translation quality, though it is incremental as it builds on the existing Transformer architecture.

The paper tackles the problem of insufficient use of intermediate layer outputs in Transformer-based Neural Machine Translation by proposing a residual tree aggregation method, which significantly outperforms strong baselines on WMT14 English-German and WMT17 English-French tasks.

Although attention-based Neural Machine Translation has achieved remarkable progress in recent layers, it still suffers from issue of making insufficient use of the output of each layer. In transformer, it only uses the top layer of encoder and decoder in the subsequent process, which makes it impossible to take advantage of the useful information in other layers. To address this issue, we propose a residual tree aggregation of layers for Transformer(RTAL), which helps to fuse information across layers. Specifically, we try to fuse the information across layers by constructing a post-order binary tree. In additional to the last node, we add the residual connection to the process of generating child nodes. Our model is based on the Neural Machine Translation model Transformer and we conduct our experiments on WMT14 English-to-German and WMT17 English-to-France translation tasks. Experimental results across language pairs show that the proposed approach outperforms the strong baseline model significantly

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes