CVCLMay 31, 2022

VALHALLA: Visual Hallucination for Machine Translation

arXiv:2206.00100v155 citationsh-index: 83
Originality Incremental advance
AI Analysis

This addresses the problem of applicability in real-world scenarios for researchers and practitioners in machine translation by enabling multimodal translation without paired images at inference, though it is incremental as it builds on existing multimodal approaches.

The paper tackles the limitation of multimodal machine translation systems requiring paired text and image inputs at inference by introducing VALHALLA, a framework that uses hallucinated visual representations from source text, achieving effectiveness over text-only baselines and state-of-the-art methods on three standard datasets.

Designing better machine translation systems by considering auxiliary inputs such as images has attracted much attention in recent years. While existing methods show promising performance over the conventional text-only translation systems, they typically require paired text and image as input during inference, which limits their applicability to real-world scenarios. In this paper, we introduce a visual hallucination framework, called VALHALLA, which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation. In particular, given a source sentence an autoregressive hallucination transformer is used to predict a discrete visual representation from the input text, and the combined text and hallucinated representations are utilized to obtain the target translation. We train the hallucination transformer jointly with the translation transformer using standard backpropagation with cross-entropy losses while being guided by an additional loss that encourages consistency between predictions using either ground-truth or hallucinated visual representations. Extensive experiments on three standard translation datasets with a diverse set of language pairs demonstrate the effectiveness of our approach over both text-only baselines and state-of-the-art methods. Project page: http://www.svcl.ucsd.edu/projects/valhalla.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes