CLLGNEMay 30, 2016

Does Multimodality Help Human and Machine for Translation and Image Captioning?

arXiv:1605.09186v487 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the utility of multimodal data for machine translation and image captioning tasks, but it is incremental as it builds on existing methods in a specific challenge setting.

The paper tackled the problem of whether multimodal data improves translation and image captioning by comparing phrase-based and neural models with monomodal vs. multimodal inputs, and it found that their systems achieved the best results on BLEU and METEOR metrics in the WMT16 challenge.

This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate the usefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes