CLMay 3, 2023

NorQuAD: Norwegian Question Answering Dataset

arXiv:2305.01957v1256 citations
Originality Synthesis-oriented
AI Analysis

This provides a new benchmark for Norwegian NLP, addressing a gap in resources for machine reading comprehension in that language.

The authors introduced NorQuAD, the first Norwegian question answering dataset for machine reading comprehension, consisting of 4,752 manually created question-answer pairs, and benchmarked multilingual and monolingual language models against human performance.

In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human performance. The dataset will be made freely available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes