CLFeb 20, 2023

90% F1 Score in Relational Triple Extraction: Is it Real ?

arXiv:2302.09887v23 citationsh-index: 23
AI Analysis

This work addresses the issue of inflated benchmarks for knowledge base construction, showing that incremental improvements are needed for real-world applications.

The paper tackles the problem of overestimated performance in relational triple extraction by evaluating state-of-the-art models under a more realistic setting that includes sentences with zero triples, revealing a significant decline of 6-15% in F1 scores across datasets. It proposes a two-step BERT-based approach that improves performance in this setting.

Extracting relational triples from text is a crucial task for constructing knowledge bases. Recent advancements in joint entity and relation extraction models have demonstrated remarkable F1 scores ($\ge 90\%$) in accurately extracting relational triples from free text. However, these models have been evaluated under restrictive experimental settings and unrealistic datasets. They overlook sentences with zero triples (zero-cardinality), thereby simplifying the task. In this paper, we present a benchmark study of state-of-the-art joint entity and relation extraction models under a more realistic setting. We include sentences that lack any triples in our experiments, providing a comprehensive evaluation. Our findings reveal a significant decline (approximately 10-15\% in one dataset and 6-14\% in another dataset) in the models' F1 scores within this realistic experimental setup. Furthermore, we propose a two-step modeling approach that utilizes a simple BERT-based classifier. This approach leads to overall performance improvement in these models within the realistic experimental setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes