CLJan 15, 2016

Multimodal Pivots for Image Caption Translation

Julian Hitschler, Shigehiko Schamoni, Stefan Riezler

arXiv:1601.03916v318.4101 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of translating image captions with limited parallel data, offering a domain-specific solution for multimodal applications.

The paper tackles the problem of improving machine translation for image descriptions by using visual similarity to retrieve target-language captions for reranking, achieving a 1 BLEU point improvement over strong baselines.

We present an approach to improve statistical machine translation of image descriptions by multimodal pivots defined in visual space. The key idea is to perform image retrieval over a database of images that are captioned in the target language, and use the captions of the most similar images for crosslingual reranking of translation outputs. Our approach does not depend on the availability of large amounts of in-domain parallel data, but only relies on available large datasets of monolingually captioned images, and on state-of-the-art convolutional neural networks to compute image similarities. Our experimental evaluation shows improvements of 1 BLEU point over strong baselines.

View on arXiv PDF

Similar