CLMar 16, 2021

Gumbel-Attention for Multi-modal Machine Translation

arXiv:2103.08862v234 citations
AI Analysis

This addresses noise issues in multi-modal machine translation for translation systems, but it is incremental as it builds on existing attention-based methods.

The paper tackles the problem of irrelevant visual information in multi-modal machine translation by proposing Gumbel-Attention to select text-related image features, resulting in improved translation quality as proven by experiments.

Multi-modal machine translation (MMT) improves translation quality by introducing visual information. However, the existing MMT model ignores the problem that the image will bring information irrelevant to the text, causing much noise to the model and affecting the translation quality. This paper proposes a novel Gumbel-Attention for multi-modal machine translation, which selects the text-related parts of the image features. Specifically, different from the previous attention-based method, we first use a differentiable method to select the image information and automatically remove the useless parts of the image features. Experiments prove that our method retains the image features related to the text, and the remaining parts help the MMT model generates better translations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes