CLMay 28, 2021

Towards More Equitable Question Answering Systems: How Much More Data Do You Need?

arXiv:2105.14115v1712 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of building more equitable QA systems across languages, but it is incremental as it focuses on optimizing existing methods rather than introducing new paradigms.

The study analyzed how to efficiently develop multilingual question answering systems by evaluating few-shot approaches with data augmentation techniques like translation and permutations, aiming to maximize resource use and guide future dataset creation for broader language coverage.

Question answering (QA) in English has been widely explored, but multilingual datasets are relatively new, with several methods attempting to bridge the gap between high- and low-resourced languages using data augmentation through translation and cross-lingual transfer. In this project, we take a step back and study which approaches allow us to take the most advantage of existing resources in order to produce QA systems in many languages. Specifically, we perform extensive analysis to measure the efficacy of few-shot approaches augmented with automatic translations and permutations of context-question-answer pairs. In addition, we make suggestions for future dataset development efforts that make better use of a fixed annotation budget, with a goal of increasing the language coverage of QA datasets and systems. Code and data for reproducing our experiments are available here: https://github.com/NavidRajabi/EMQA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes