CLAIIRJun 21, 2015

Extreme Extraction: Only One Hour per Relation

arXiv:1506.06418v15 citations
Originality Incremental advance
AI Analysis

This addresses the bottleneck of manual engineering in information extraction for applications like knowledge base construction, though it is incremental in improving efficiency.

The paper tackles the problem of high expert labor required for information extraction by introducing InstaRead, a system that reduces the effort to under one hour per relation while achieving performance equal to or better than supervised and distantly supervised methods.

Information Extraction (IE) aims to automatically generate a large knowledge base from natural language text, but progress remains slow. Supervised learning requires copious human annotation, while unsupervised and weakly supervised approaches do not deliver competitive accuracy. As a result, most fielded applications of IE, as well as the leading TAC-KBP systems, rely on significant amounts of manual engineering. Even "Extreme" methods, such as those reported in Freedman et al. 2011, require about 10 hours of expert labor per relation. This paper shows how to reduce that effort by an order of magnitude. We present a novel system, InstaRead, that streamlines authoring with an ensemble of methods: 1) encoding extraction rules in an expressive and compositional representation, 2) guiding the user to promising rules based on corpus statistics and mined resources, and 3) introducing a new interactive development cycle that provides immediate feedback --- even on large datasets. Experiments show that experts can create quality extractors in under an hour and even NLP novices can author good extractors. These extractors equal or outperform ones obtained by comparably supervised and state-of-the-art distantly supervised approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes