CLMay 28, 2023

HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

Shantipriya Parida, Idris Abdulmumin, Shamsuddeen Hassan Muhammad, Aneesh Bose, Guneet Singh Kohli, Ibrahim Said Ahmad, Ketan Kotwal, Sayan Deb Sarkar, Ondřej Bojar, Habeebah Adamu Kakudi

arXiv:2305.17690v126.6226 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This addresses the lack of resources for multimodal AI research in low-resource languages like Hausa, though it is incremental as it adapts existing data.

The paper introduces HaVQA, the first multimodal dataset for visual question-answering in Hausa, created by manually translating 6,022 English question-answer pairs from Visual Genome images, resulting in 12,044 parallel sentences with semantic alignment to visuals.

This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multimodal machine translation.

View on arXiv PDF Code

Similar