CEntRE: A paragraph-level Chinese dataset for Relation Extraction among Enterprises
This addresses the problem of limited data for enterprise relation extraction, which is crucial for applications like risk analysis, but it is incremental as it focuses on dataset creation rather than method innovation.
The authors tackled the lack of datasets for enterprise relation extraction by introducing CEntRE, a new Chinese dataset from business news, and found that six state-of-the-art models performed poorly on it, highlighting its difficulty.
Enterprise relation extraction aims to detect pairs of enterprise entities and identify the business relations between them from unstructured or semi-structured text data, and it is crucial for several real-world applications such as risk analysis, rating research and supply chain security. However, previous work mainly focuses on getting attribute information about enterprises like personnel and corporate business, and pays little attention to enterprise relation extraction. To encourage further progress in the research, we introduce the CEntRE, a new dataset constructed from publicly available business news data with careful human annotation and intelligent data processing. Extensive experiments on CEntRE with six excellent models demonstrate the challenges of our proposed dataset.