CLDec 2, 2016

Creating a Real-Time, Reproducible Event Dataset

arXiv:1612.00866v15 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This addresses the problem of slow and non-reproducible event data for political scientists and analysts, though it appears incremental as it builds on existing methods with new tools.

The paper tackles the outdated generation of political event data by introducing Phoenix, a next-generation dataset that leverages modern NLP and digitized news to produce daily-updated data, with face validity checks showing its application in the Syria conflict and comparison to existing datasets.

The generation of political event data has remained much the same since the mid-1990s, both in terms of data acquisition and the process of coding text into data. Since the 1990s, however, there have been significant improvements in open-source natural language processing software and in the availability of digitized news content. This paper presents a new, next-generation event dataset, named Phoenix, that builds from these and other advances. This dataset includes improvements in the underlying news collection process and event coding software, along with the creation of a general processing pipeline necessary to produce daily-updated data. This paper provides a face validity checks by briefly examining the data for the conflict in Syria, and a comparison between Phoenix and the Integrated Crisis Early Warning System data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes