CLAINov 26, 2021

Predicting Document Coverage for Relation Extraction

arXiv:2111.13611v1632 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of selecting optimal documents for relation extraction in large corpora, which is incremental as it introduces a new task but builds on existing methods.

The paper tackles the problem of predicting whether a document contains many relational tuples for a given entity to aid in knowledge base construction, achieving an F1 score of up to 46% with a model combining features and BERT.

This paper presents a new task of predicting the coverage of a text document for relation extraction (RE): does the document contain many relational tuples for a given entity? Coverage predictions are useful in selecting the best documents for knowledge base construction with large input corpora. To study this problem, we present a dataset of 31,366 diverse documents for 520 entities. We analyze the correlation of document coverage with features like length, entity mention frequency, Alexa rank, language complexity and information retrieval scores. Each of these features has only moderate predictive power. We employ methods combining features with statistical models like TF-IDF and language models like BERT. The model combining features and BERT, HERB, achieves an F1 score of up to 46%. We demonstrate the utility of coverage predictions on two use cases: KB construction and claim refutation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes