CLMar 20, 2023

DocRED-FE: A Document-Level Fine-Grained Entity And Relation Extraction Dataset

Hongbo Wang, Weimin Xiong, Yifan Song, Dawei Zhu, Yu Xia, Sujian Li

Peking U

arXiv:2303.11141v21.34 citationsh-index: 100Has Code

Originality Synthesis-oriented

AI Analysis

This addresses a limitation in real-world information extraction by providing a more detailed dataset for researchers, though it is incremental as it builds on an existing dataset.

The authors tackled the lack of document-level fine-grained joint entity and relation extraction datasets by constructing DocRED-FE, a large-scale dataset with hierarchical entity types, which they found to be challenging for existing models and beneficial for relation classification.

Joint entity and relation extraction (JERE) is one of the most important tasks in information extraction. However, most existing works focus on sentence-level coarse-grained JERE, which have limitations in real-world scenarios. In this paper, we construct a large-scale document-level fine-grained JERE dataset DocRED-FE, which improves DocRED with Fine-Grained Entity Type. Specifically, we redesign a hierarchical entity type schema including 11 coarse-grained types and 119 fine-grained types, and then re-annotate DocRED manually according to this schema. Through comprehensive experiments we find that: (1) DocRED-FE is challenging to existing JERE models; (2) Our fine-grained entity types promote relation classification. We make DocRED-FE with instruction and the code for our baselines publicly available at https://github.com/PKU-TANGENT/DOCRED-FE.

View on arXiv PDF Code

Similar