CLJan 6, 2018

Analysis of Wikipedia-based Corpora for Question Answering

arXiv:1801.02073v26 citations
Originality Synthesis-oriented
AI Analysis

This work provides insights for researchers in natural language processing to better utilize existing datasets for question answering tasks, though it is incremental as it focuses on analysis rather than new methods.

The paper analyzed four Wikipedia-based question answering corpora (WikiQA, SelQA, SQuAD, InfoQA) through intrinsic and extrinsic methods, and introduced an indexing-based approach to create a silver-standard dataset for answer retrieval, showing their uniqueness and suggesting improved usage for statistical QA learning.

This paper gives comprehensive analyses of corpora based on Wikipedia for several tasks in question answering. Four recent corpora are collected,WikiQA, SelQA, SQuAD, and InfoQA, and first analyzed intrinsically by contextual similarities, question types, and answer categories. These corpora are then analyzed extrinsically by three question answering tasks, answer retrieval, selection, and triggering. An indexing-based method for the creation of a silver-standard dataset for answer retrieval using the entire Wikipedia is also presented. Our analysis shows the uniqueness of these corpora and suggests a better use of them for statistical question answering learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes