CLApr 8, 2022

CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction

arXiv:2204.03871v1592 citationsh-index: 14Has Code
Originality Synthesis-oriented
AI Analysis

This resource addresses a gap in economic and financial text mining by providing a specialized dataset for researchers, though it is incremental as it builds on existing event extraction methods.

The authors introduced CrudeOilNews, the first annotated corpus for event extraction in commodity news, containing 425 articles with about 11,000 events, and demonstrated its utility by training basic models for machine labeling.

In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation methodology and the event typology used in producing the corpus. Firstly, a seed set of 175 news articles were manually annotated, of which a subset of 25 news were used as the adjudicated reference test set for inter-annotator and system evaluation. Agreement was generally substantial and annotator performance was adequate, indicating that the annotation scheme produces consistent event annotations of high quality. Subsequently the dataset is expanded through (1) data augmentation and (2) Human-in-the-loop active learning. The resulting corpus has 425 news articles with approximately 11k events annotated. As part of active learning process, the corpus was used to train basic event extraction models for machine labeling, the resulting models also serve as a validation or as a pilot study demonstrating the use of the corpus in machine learning purposes. The annotated corpus is made available for academic research purpose at https://github.com/meisin/CrudeOilNews-Corpus.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes