CL AI LGFeb 23, 2023

Extracting Victim Counts from Text

Mian Zhong, Shehzaad Dhuliawala, Niklas Stoehr

ETH Zurich

arXiv:2302.12367v128.1269 citationsh-index: 13Has Code

Originality Incremental advance

AI Analysis

This work addresses a critical need for humanitarian decision-makers by enabling more accurate extraction of fine-grained victim counts from text, though it is incremental as it applies existing QA methods to a specific domain.

The paper tackles the problem of extracting victim counts from crisis event text descriptions by framing it as a question answering task with regression or classification objectives, comparing methods including regex, dependency parsing, and text-to-text models, and finds that advanced models improve extraction reliability and robustness for real-world humanitarian applications.

Decision-makers in the humanitarian sector rely on timely and exact information during crisis events. Knowing how many civilians were injured during an earthquake is vital to allocate aids properly. Information about such victim counts is often only available within full-text event descriptions from newspapers and other reports. Extracting numbers from text is challenging: numbers have different formats and may require numeric reasoning. This renders purely string matching-based approaches insufficient. As a consequence, fine-grained counts of injured, displaced, or abused victims beyond fatalities are often not extracted and remain unseen. We cast victim count extraction as a question answering (QA) task with a regression or classification objective. We compare regex, dependency parsing, semantic role labeling-based approaches, and advanced text-to-text models. Beyond model accuracy, we analyze extraction reliability and robustness which are key for this sensitive task. In particular, we discuss model calibration and investigate few-shot and out-of-distribution performance. Ultimately, we make a comprehensive recommendation on which model to select for different desiderata and data domains. Our work is among the first to apply numeracy-focused large language models in a real-world use case with a positive impact.

View on arXiv PDF Code

Similar