CLMay 1, 2022

ELQA: A Corpus of Metalinguistic Questions and Answers about English

arXiv:2205.00395v2225 citationsh-index: 42
Originality Synthesis-oriented
AI Analysis

This dataset enables research into the metalinguistic abilities of NLU models and supports educational applications for language learning, representing an incremental contribution by providing a new resource for a specific domain.

The authors introduced ELQA, a corpus of over 70k metalinguistic questions and answers about English collected from online forums, covering topics like grammar and etymology, and evaluated multiple LLMs on a free-form QA task to analyze their metalinguistic capabilities.

We present ELQA, a corpus of questions and answers in and about the English language. Collected from two online forums, the >70k questions (from English learners and others) cover wide-ranging topics including grammar, meaning, fluency, and etymology. The answers include descriptions of general properties of English vocabulary and grammar as well as explanations about specific (correct and incorrect) usage examples. Unlike most NLP datasets, this corpus is metalinguistic -- it consists of language about language. As such, it can facilitate investigations of the metalinguistic capabilities of NLU models, as well as educational applications in the language learning domain. To study this, we define a free-form question answering task on our dataset and conduct evaluations on multiple LLMs (Large Language Models) to analyze their capacity to generate metalinguistic answers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes