Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Machine Translation System Report
This work addresses multimodal translation for language pairs like English-German and English-Czech, but it is incremental as it builds on existing methods with ensemble techniques.
The paper tackled multimodal machine translation by incorporating image features into the decoder and using sequence-level training methods, achieving best performance in three WMT 18 subtasks.
This paper describes multimodal machine translation systems developed jointly by Oregon State University and Baidu Research for WMT 2018 Shared Task on multimodal translation. In this paper, we introduce a simple approach to incorporate image information by feeding image features to the decoder side. We also explore different sequence level training methods including scheduled sampling and reinforcement learning which lead to substantial improvements. Our systems ensemble several models using different architectures and training methods and achieve the best performance for three subtasks: En-De and En-Cs in task 1 and (En+De+Fr)-Cs task 1B.