CLMar 20, 2023

DocRED-FE: A Document-Level Fine-Grained Entity And Relation Extraction Dataset

Peking U
arXiv:2303.11141v24 citationsh-index: 100Has Code
Originality Synthesis-oriented
AI Analysis

This addresses a limitation in real-world information extraction by providing a more detailed dataset for researchers, though it is incremental as it builds on an existing dataset.

The authors tackled the lack of document-level fine-grained joint entity and relation extraction datasets by constructing DocRED-FE, a large-scale dataset with hierarchical entity types, which they found to be challenging for existing models and beneficial for relation classification.

Joint entity and relation extraction (JERE) is one of the most important tasks in information extraction. However, most existing works focus on sentence-level coarse-grained JERE, which have limitations in real-world scenarios. In this paper, we construct a large-scale document-level fine-grained JERE dataset DocRED-FE, which improves DocRED with Fine-Grained Entity Type. Specifically, we redesign a hierarchical entity type schema including 11 coarse-grained types and 119 fine-grained types, and then re-annotate DocRED manually according to this schema. Through comprehensive experiments we find that: (1) DocRED-FE is challenging to existing JERE models; (2) Our fine-grained entity types promote relation classification. We make DocRED-FE with instruction and the code for our baselines publicly available at https://github.com/PKU-TANGENT/DOCRED-FE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes