LIUM-CVC Submissions for WMT17 Multimodal Translation Task
This work addresses translation accuracy in multimodal settings for language processing researchers, though it is incremental as it builds on existing architectures.
The paper tackled multimodal neural machine translation by integrating global visual features or convolutional feature maps to leverage visual context, achieving first-place rankings in En-De and En-Fr language pairs on WMT17 with metrics like METEOR and BLEU.
This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs according to the automatic evaluation metrics METEOR and BLEU.