CLJun 1, 2025

LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World

Stanford
arXiv:2506.00980v13 citationsh-index: 9ACL
Originality Incremental advance
AI Analysis

This work addresses the problem of aggregating multilingual sources for global conflict analysis, providing a dataset and methods that are incremental improvements over existing event extraction approaches.

The paper tackles the challenge of multilingual event analysis by introducing LEMONADE, a large-scale conflict event dataset with 39,786 events across 20 languages, and proposes abstractive event extraction (AEE) and abstractive entity linking (AEL) tasks, where their best zero-shot system achieves an end-to-end F1 score of 58.3% and ZEST for AEL achieves 45.7%, outperforming baselines but lagging behind supervised systems.

This paper presents LEMONADE, a large-scale conflict event dataset comprising 39,786 events across 20 languages and 171 countries, with extensive coverage of region-specific entities. LEMONADE is based on a partially reannotated subset of the Armed Conflict Location & Event Data (ACLED), which has documented global conflict events for over a decade. To address the challenge of aggregating multilingual sources for global event analysis, we introduce abstractive event extraction (AEE) and its subtask, abstractive entity linking (AEL). Unlike conventional span-based event extraction, our approach detects event arguments and entities through holistic document understanding and normalizes them across the multilingual dataset. We evaluate various large language models (LLMs) on these tasks, adapt existing zero-shot event extraction systems, and benchmark supervised models. Additionally, we introduce ZEST, a novel zero-shot retrieval-based system for AEL. Our best zero-shot system achieves an end-to-end F1 score of 58.3%, with LLMs outperforming specialized event extraction models such as GoLLIE. For entity linking, ZEST achieves an F1 score of 45.7%, significantly surpassing OneNet, a state-of-the-art zero-shot baseline that achieves only 23.7%. However, these zero-shot results lag behind the best supervised systems by 20.1% and 37.0% in the end-to-end and AEL tasks, respectively, highlighting the need for further research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes