CLIRNov 28, 2016

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

arXiv:1611.09268v33410 citations
Originality Synthesis-oriented
AI Analysis

This dataset addresses the need for realistic, large-scale benchmarks in machine reading comprehension and question-answering for researchers and practitioners.

The authors introduced MS MARCO, a large-scale machine reading comprehension dataset with over 1 million real user questions and human-generated answers, derived from Bing search logs, to benchmark models on tasks like answerability prediction and passage ranking.

We introduce a large scale MAchine Reading COmprehension dataset, which we name MS MARCO. The dataset comprises of 1,010,916 anonymized questions---sampled from Bing's search query logs---each with a human generated answer and 182,669 completely human rewritten generated answers. In addition, the dataset contains 8,841,823 passages---extracted from 3,563,535 web documents retrieved by Bing---that provide the information necessary for curating the natural language answers. A question in the MS MARCO dataset may have multiple answers or no answers at all. Using this dataset, we propose three different tasks with varying levels of difficulty: (i) predict if a question is answerable given a set of context passages, and extract and synthesize the answer as a human would (ii) generate a well-formed answer (if possible) based on the context passages that can be understood with the question and passage context, and finally (iii) rank a set of retrieved passages given a question. The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering. We believe that the scale and the real-world nature of this dataset makes it attractive for benchmarking machine reading comprehension and question-answering models.

Code Implementations14 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes