Multilingual Image Description with Neural Sequence Models
This work addresses the need for multilingual image descriptions, which is incremental as it builds on existing neural methods.
The paper tackled the problem of generating image descriptions in multiple languages by combining neural machine translation and image description techniques, achieving significant improvements in BLEU4 and Meteor scores over a monolingual baseline on the IAPR-TC12 dataset.
In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description. To create a description of an image for a given target language, our sequence generation models condition on feature vectors from the image, the description from the source language, and/or a multimodal vector computed over the image and a description in the source language. In image description experiments on the IAPR-TC12 dataset of images aligned with English and German sentences, we find significant and substantial improvements in BLEU4 and Meteor scores for models trained over multiple languages, compared to a monolingual baseline.