CLMay 24, 2023

A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction

arXiv:2305.15051v26 citations
Originality Incremental advance
AI Analysis

This work enables social scientists to flexibly extract dyadic events from text for international relations analysis without retraining, though it is incremental in improving zero-shot methods.

The paper tackles the problem of zero-shot event extraction for sociopolitical analysis by developing a Monte Carlo language model pipeline that addresses issues like word sense ambiguity and computational inefficiency, achieving at least a 17 F1 point improvement over naive methods and reducing queries to 12% of previous approaches.

Current social science efforts automatically populate event databases of "who did what to whom?" tuples, by applying event extraction (EE) to text such as news. The event databases are used to analyze sociopolitical dynamics between actor pairs (dyads) in, e.g., international relations. While most EE methods heavily rely on rules or supervised learning, \emph{zero-shot} event extraction could potentially allow researchers to flexibly specify arbitrary event classes for new research questions. Unfortunately, we find that current zero-shot EE methods, as well as a naive zero-shot approach of simple generative language model (LM) prompting, perform poorly for dyadic event extraction; most suffer from word sense ambiguity, modality sensitivity, and computational inefficiency. We address these challenges with a new fine-grained, multi-stage instruction-following generative LM pipeline, proposing a Monte Carlo approach to deal with, and even take advantage of, nondeterminism of generative outputs. Our pipeline includes explicit stages of linguistic analysis (synonym generation, contextual disambiguation, argument realization, event modality), \textit{improving control and interpretability} compared to purely neural methods. This method outperforms other zero-shot EE approaches, and outperforms naive applications of generative LMs by at least 17 F1 percent points. The pipeline's filtering mechanism greatly improves computational efficiency, allowing it to perform as few as 12% of queries that a previous zero-shot method uses. Finally, we demonstrate our pipeline's application to dyadic international relations analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes