CLMay 28, 2023

HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

arXiv:2305.17690v1226 citations
Originality Synthesis-oriented
AI Analysis

This addresses the lack of resources for multimodal AI research in low-resource languages like Hausa, though it is incremental as it adapts existing data.

The paper introduces HaVQA, the first multimodal dataset for visual question-answering in Hausa, created by manually translating 6,022 English question-answer pairs from Visual Genome images, resulting in 12,044 parallel sentences with semantic alignment to visuals.

This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multimodal machine translation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes