CLCVMay 11, 2017

Imagination improves Multimodal Translation

arXiv:1705.04350v2147 citations
Originality Incremental advance
AI Analysis

This addresses multimodal translation for language processing, but it appears incremental as it builds on existing multitask and attention-based methods.

The paper tackles multimodal translation by decomposing it into translation and visually grounded representation learning in a multitask framework, improving state-of-the-art performance on the Multi30K dataset and showing effectiveness with external datasets like MS COCO and News Commentary.

We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes