CLAIIRMAMMMay 7, 2018

Multimodal Machine Translation with Reinforcement Learning

arXiv:1805.02356v115 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving translation accuracy for multimodal tasks, though it is incremental as it applies an existing RL method to a specific domain.

The paper tackled multimodal machine translation by integrating reinforcement learning with image and text inputs, achieving better results than supervised learning baselines on the Multi30K and Flickr30K datasets.

Multimodal machine translation is one of the applications that integrates computer vision and language processing. It is a unique task given that in the field of machine translation, many state-of-the-arts algorithms still only employ textual information. In this work, we explore the effectiveness of reinforcement learning in multimodal machine translation. We present a novel algorithm based on the Advantage Actor-Critic (A2C) algorithm that specifically cater to the multimodal machine translation task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We experiment our proposed algorithm on the Multi30K multilingual English-German image description dataset and the Flickr30K image entity dataset. Our model takes two channels of inputs, image and text, uses translation evaluation metrics as training rewards, and achieves better results than supervised learning MLE baseline models. Furthermore, we discuss the prospects and limitations of using reinforcement learning for machine translation. Our experiment results suggest a promising reinforcement learning solution to the general task of multimodal sequence to sequence learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes