Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types
This addresses data scarcity in event argument extraction for domains with limited training data, though it is incremental as it builds on existing augmentation methods.
The paper tackled the problem of few-shot cross-domain event argument extraction by introducing LLM-powered data augmentation frameworks, achieving a 16-point F1 increase for zero-shot role types and up to 11-point improvement in a new metric for cross-domain analysis.
Event Argument Extraction (EAE) is an extremely difficult information extraction problem -- with significant limitations in few-shot cross-domain (FSCD) settings. A common solution to FSCD modeling is data augmentation. Unfortunately, existing augmentation methods are not well-suited to a variety of real-world EAE contexts including (i) The need to model long documents (10+ sentences) (ii) The need to model zero and few-shot roles (i.e. event roles with little to no training representation). In this work, we introduce two novel LLM-powered data augmentation frameworks for synthesizing extractive document-level EAE samples using zero in-domain training data. Our highest performing methods provide a 16-pt increase in F1 score on extraction of zero shot role types. To better facilitate analysis of cross-domain EAE, we additionally introduce a new metric, Role-Depth F1 (RDF1), which uses statistical depth to identify roles in the target domain which are semantic outliers with respect to roles observed in the source domain. Our experiments show that LLM-based augmentation can boost RDF1 performance by up to 11 F1 points compared to baseline methods.