CLMay 10, 2021

DocOIE: A Document-level Context-Aware Dataset for OpenIE

arXiv:2105.04271v2716 citations
Originality Incremental advance
AI Analysis

This work addresses the need for document-level context in OpenIE for NLP applications, but it is incremental as it builds on existing sentence-level methods by adding context-aware features.

The paper tackles the problem of Open Information Extraction (OpenIE) by addressing the lack of document-level context, which is crucial for accurate interpretation, and introduces a manually annotated dataset of 800 sentences from 80 documents in Healthcare and Transportation domains, along with a novel model that shows improved performance when incorporating such context.

Open Information Extraction (OpenIE) aims to extract structured relational tuples (subject, relation, object) from sentences and plays critical roles for many downstream NLP applications. Existing solutions perform extraction at sentence level, without referring to any additional contextual information. In reality, however, a sentence typically exists as part of a document rather than standalone; we often need to access relevant contextual information around the sentence before we can accurately interpret it. As there is no document-level context-aware OpenIE dataset available, we manually annotate 800 sentences from 80 documents in two domains (Healthcare and Transportation) to form a DocOIE dataset for evaluation. In addition, we propose DocIE, a novel document-level context-aware OpenIE model. Our experimental results based on DocIE demonstrate that incorporating document-level context is helpful in improving OpenIE performance. Both DocOIE dataset and DocIE model are released for public.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes