Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge
This dataset provides a more realistic and challenging benchmark for direct-answer question answering, particularly for researchers working on complex reasoning and explanation generation, by converting a multiple-choice reasoning challenge into an open-response format.
The authors introduce ARC-DA, a direct-answer version of the ARC multiple-choice dataset, comprising 2985 questions with 8436 valid answers. This new dataset aims to address the limitations of multiple-choice formats in real-world question answering and reasoning. The best performing models achieved 81% GENIE, 61.4% F1, and 63.2% ROUGE-L, indicating significant room for improvement.
We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community, its multiple-choice format is unrepresentative of real-world questions, and multiple choice formats can be particularly susceptible to artifacts. The ARC-DA dataset addresses these concerns by converting questions to direct-answer format using a combination of crowdsourcing and expert review. The resulting dataset contains 2985 questions with a total of 8436 valid answers (questions typically have more than one valid answer). ARC-DA is one of the first DA datasets of natural questions that often require reasoning, and where appropriate question decompositions are not evident from the questions themselves. We describe the conversion approach taken, appropriate evaluation metrics, and several strong models. Although high, the best scores (81% GENIE, 61.4% F1, 63.2% ROUGE-L) still leave considerable room for improvement. In addition, the dataset provides a natural setting for new research on explanation, as many questions require reasoning to construct answers. We hope the dataset spurs further advances in complex question-answering by the community. ARC-DA is available at https://allenai.org/data/arc-da