CLAIFeb 24, 2025

SNaRe: Domain-aware Data Generation for Low-Resource Event Detection

CMU
arXiv:2502.17394v32 citationsh-index: 17EMNLP
Originality Incremental advance
AI Analysis

This addresses the challenge of expensive expert annotations for event detection in domains like biomedicine and law, though it is incremental as it builds on existing data generation approaches.

The paper tackled the problem of label noise and domain drift in synthetic data generation for event detection in specialized domains, introducing SNaRe, a domain-aware framework that achieved average F1 gains of 3-7% in zero-shot/few-shot settings and 4-20% for multilingual generation.

Event Detection (ED) -- the task of identifying event mentions from natural language text -- is critical for enabling reasoning in highly specialized domains such as biomedicine, law, and epidemiology. Data generation has proven to be effective in broadening its utility to wider applications without requiring expensive expert annotations. However, when existing generation approaches are applied to specialized domains, they struggle with label noise, where annotations are incorrect, and domain drift, characterized by a distributional mismatch between generated sentences and the target domain. To address these issues, we introduce SNaRe, a domain-aware synthetic data generation framework composed of three components: Scout, Narrator, and Refiner. Scout extracts triggers from unlabeled target domain data and curates a high-quality domain-specific trigger list using corpus-level statistics to mitigate domain drift. Narrator, conditioned on these triggers, generates high-quality domain-aligned sentences, and Refiner identifies additional event mentions, ensuring high annotation quality. Experimentation on three diverse domain ED datasets reveals how SNaRe outperforms the best baseline, achieving average F1 gains of 3-7% in the zero-shot/few-shot settings and 4-20% F1 improvement for multilingual generation. Analyzing the generated trigger hit rate and human evaluation substantiates SNaRe's stronger annotation quality and reduced domain drift.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes