CLNov 16, 2023

TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction

CMU
arXiv:2311.09562v338 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

This addresses evaluation inconsistencies and biases for researchers in NLP, though it is incremental as it builds on existing datasets and methods.

The authors tackled evaluation issues in event extraction by creating TextEE, a standardized benchmark with 16 datasets and 14 methods, revealing that large language models struggle to achieve satisfactory performance.

Event extraction has gained considerable interest due to its wide-ranging applications. However, recent studies draw attention to evaluation issues, suggesting that reported scores may not accurately reflect the true performance. In this work, we identify and address evaluation challenges, including inconsistency due to varying data assumptions or preprocessing steps, the insufficiency of current evaluation frameworks that may introduce dataset or data split bias, and the low reproducibility of some previous approaches. To address these challenges, we present TextEE, a standardized, fair, and reproducible benchmark for event extraction. TextEE comprises standardized data preprocessing scripts and splits for 16 datasets spanning eight diverse domains and includes 14 recent methodologies, conducting a comprehensive benchmark reevaluation. We also evaluate five varied large language models on our TextEE benchmark and demonstrate how they struggle to achieve satisfactory performance. Inspired by our reevaluation results and findings, we discuss the role of event extraction in the current NLP era, as well as future challenges and insights derived from TextEE. We believe TextEE, the first standardized comprehensive benchmarking tool, will significantly facilitate future event extraction research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes