Two-Headed Monster And Crossed Co-Attention Networks
This work addresses machine translation accuracy for NLP researchers, but it appears incremental as it builds on the Transformer model with specific modifications.
The paper tackles the problem of improving neural transduction models by proposing a new co-attention mechanism, resulting in performance gains over the Transformer baseline with increases of 0.51 to 0.74 BLEU points on WMT 2014 EN-DE and 0.17 to 0.47 BLEU points on WMT 2016 EN-FI translation tasks.
This paper presents some preliminary investigations of a new co-attention mechanism in neural transduction models. We propose a paradigm, termed Two-Headed Monster (THM), which consists of two symmetric encoder modules and one decoder module connected with co-attention. As a specific and concrete implementation of THM, Crossed Co-Attention Networks (CCNs) are designed based on the Transformer model. We demonstrate CCNs on WMT 2014 EN-DE and WMT 2016 EN-FI translation tasks and our model outperforms the strong Transformer baseline by 0.51 (big) and 0.74 (base) BLEU points on EN-DE and by 0.17 (big) and 0.47 (base) BLEU points on EN-FI.