CUNI System for the WMT17 Multimodal Translation Task
This is an incremental contribution focused on improving translation accuracy for participants in multimodal machine translation benchmarks.
The authors tackled the WMT17 Multimodal Translation Task by developing a purely textual neural translation system for Task 1, using additional data from parallel corpora and back-translation, and for Task 2, they generated English captions and translated them with the Task 1 system, but did not report specific performance numbers.
In this paper, we describe our submissions to the WMT17 Multimodal Translation Task. For Task 1 (multimodal translation), our best scoring system is a purely textual neural translation of the source image caption to the target language. The main feature of the system is the use of additional data that was acquired by selecting similar sentences from parallel corpora and by data synthesis with back-translation. For Task 2 (cross-lingual image captioning), our best submitted system generates an English caption which is then translated by the best system used in Task 1. We also present negative results, which are based on ideas that we believe have potential of making improvements, but did not prove to be useful in our particular setup.