CLJan 30, 2019

Span Model for Open Information Extraction on Accurate Corpus

arXiv:1901.10879v690 citations
Originality Incremental advance
AI Analysis

This work addresses data quality issues in open information extraction, offering incremental improvements for NLP researchers.

The paper tackles the challenge of open information extraction by improving training data exploitation and introducing a re-annotated test set, with a span model achieving new state-of-the-art performance on benchmark datasets.

Open information extraction (Open IE) is a challenging task especially due to its brittle data basis. Most of Open IE systems have to be trained on automatically built corpus and evaluated on inaccurate test set. In this work, we first alleviate this difficulty from both sides of training and test sets. For the former, we propose an improved model design to more sufficiently exploit training dataset. For the latter, we present our accurately re-annotated benchmark test set (Re-OIE6) according to a series of linguistic observation and analysis. Then, we introduce a span model instead of previous adopted sequence labeling formulization for n-ary Open IE. Our newly introduced model achieves new state-of-the-art performance on both benchmark evaluation datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes