CLDec 13, 2016

Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors

arXiv:1612.04342v14 citations
Originality Incremental advance
AI Analysis

This work addresses the need for large-scale datasets and improved models in machine reading comprehension, though it is incremental in nature.

The paper tackled the problem of machine reading comprehension by creating a large dataset of 2 million examples using paragraph vectors and introducing a hybrid neural network architecture, which achieved 83.2% accuracy compared to a human ceiling of 91%.

We present a dual contribution to the task of machine reading-comprehension: a technique for creating large-sized machine-comprehension (MC) datasets using paragraph-vector models; and a novel, hybrid neural-network architecture that combines the representation power of recurrent neural networks with the discriminative power of fully-connected multi-layered networks. We use the MC-dataset generation technique to build a dataset of around 2 million examples, for which we empirically determine the high-ceiling of human performance (around 91% accuracy), as well as the performance of a variety of computer models. Among all the models we have experimented with, our hybrid neural-network architecture achieves the highest performance (83.2% accuracy). The remaining gap to the human-performance ceiling provides enough room for future model improvements.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes