CLAug 31, 2018

The MeMAD Submission to the WMT18 Multimodal Translation Task

Stig-Arne Grönroos, Benoit Huet, Mikko Kurimo, Jorma Laaksonen, Bernard Merialdo, Phu Pham, Mats Sjöberg, Umut Sulubacak, Jörg Tiedemann, Raphael Troncy, Raúl Vázquez

arXiv:1808.10802v232.21106 citations

Originality Synthesis-oriented

AI Analysis

This work addresses multimodal translation for researchers and practitioners, but it is incremental as it builds on existing Transformer methods with minor visual adaptations.

The paper tackled multimodal machine translation by adapting the Transformer architecture to incorporate visual features, achieving top scores in English-to-German and English-to-French tasks on the flickr18 dataset according to automatic metrics, though the visual features provided only small improvements compared to gains from the underlying text-only system and use of additional data.

This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top scoring system for both English-to-German and English-to-French, according to the automatic metrics for flickr18. Our experiments show that the effect of the visual features in our system is small. Our largest gains come from the quality of the underlying text-only NMT system. We find that appropriate use of additional data is effective.

View on arXiv PDF

Similar