CLAILGMay 22, 2022

TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks

arXiv:2205.10726v2584 citationsh-index: 17
Originality Synthesis-oriented
AI Analysis

This addresses the problem of delayed outbreak detection for public health, but is incremental as it primarily introduces a new dataset.

The authors tackled the lack of labeled datasets for detecting foodborne illness outbreaks from social media by presenting TWEET-FID, the first publicly available annotated dataset for multiple detection tasks, and provided results using state-of-the-art deep learning methods.

Foodborne illness is a serious but preventable public health problem -- with delays in detecting the associated outbreaks resulting in productivity loss, expensive recalls, public safety hazards, and even loss of life. While social media is a promising source for identifying unreported foodborne illnesses, there is a dearth of labeled datasets for developing effective outbreak detection models. To accelerate the development of machine learning-based models for foodborne outbreak detection, we thus present TWEET-FID (TWEET-Foodborne Illness Detection), the first publicly available annotated dataset for multiple foodborne illness incident detection tasks. TWEET-FID collected from Twitter is annotated with three facets: tweet class, entity type, and slot type, with labels produced by experts as well as by crowdsource workers. We introduce several domain tasks leveraging these three facets: text relevance classification (TRC), entity mention detection (EMD), and slot filling (SF). We describe the end-to-end methodology for dataset design, creation, and labeling for supporting model development for these tasks. A comprehensive set of results for these tasks leveraging state-of-the-art single- and multi-task deep learning methods on the TWEET-FID dataset are provided. This dataset opens opportunities for future research in foodborne outbreak detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes