CL AI IRApr 23, 2020

Rapidly Bootstrapping a Question Answering Dataset for COVID-19

Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam, Kyunghyun Cho, Jimmy Lin

arXiv:2004.11339v15.071 citations

Originality Synthesis-oriented

AI Analysis

This provides a stopgap evaluation resource for researchers working on COVID-19-related AI, though it is incremental as it builds on existing datasets and methods.

The authors tackled the lack of a COVID-19-specific question answering dataset by creating CovidQA, a manually built dataset with 124 question-article pairs, and evaluated baseline models including transformer-based ones to assess zero-shot or transfer capabilities.

We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge. To our knowledge, this is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available. While this dataset, comprising 124 question-article pairs as of the present version 0.1 release, does not have sufficient examples for supervised machine learning, we believe that it can be helpful for evaluating the zero-shot or transfer capabilities of existing models on topics specifically related to COVID-19. This paper describes our methodology for constructing the dataset and presents the effectiveness of a number of baselines, including term-based techniques and various transformer-based models. The dataset is available at http://covidqa.ai/

View on arXiv PDF

Similar