UMONS Submission for WMT18 Multimodal Translation Task
This work addresses the problem of improving translation quality in multimodal settings for researchers and practitioners, though it is incremental as it builds on existing neural image captioning methods.
The paper tackled multimodal machine translation by introducing a novel deepGRU architecture, achieving the best METEOR scores for English-to-German and English-to-French tasks with images.
This paper describes the UMONS solution for the Multimodal Machine Translation Task presented at the third conference on machine translation (WMT18). We explore a novel architecture, called deepGRU, based on recent findings in the related task of Neural Image Captioning (NIC). The models presented in the following sections lead to the best METEOR translation score for both constrained (English, image) -> German and (English, image) -> French sub-tasks.