CLFeb 3, 2025

Annotation Tool and Dataset for Fact-Checking Podcasts

arXiv:2502.01402v13 citationsh-index: 5WWW
Originality Incremental advance
AI Analysis

This addresses the problem of verifying unverified claims in diverse and multilingual podcast content for fact-checkers and researchers, representing an incremental improvement by integrating existing tools with novel annotation capabilities.

The authors tackled the challenge of fact-checking podcasts by developing a tool for real-time annotation during playback, which enabled the creation of a high-quality dataset used to fine-tune multilingual transformer models like XLM-RoBERTa for tasks such as claim detection and stance classification.

Podcasts are a popular medium on the web, featuring diverse and multilingual content that often includes unverified claims. Fact-checking podcasts is a challenging task, requiring transcription, annotation, and claim verification, all while preserving the contextual details of spoken content. Our tool offers a novel approach to tackle these challenges by enabling real-time annotation of podcasts during playback. This unique capability allows users to listen to the podcast and annotate key elements, such as check-worthy claims, claim spans, and contextual errors, simultaneously. By integrating advanced transcription models like OpenAI's Whisper and leveraging crowdsourced annotations, we create high-quality datasets to fine-tune multilingual transformer models such as XLM-RoBERTa for tasks like claim detection and stance classification. Furthermore, we release the annotated podcast transcripts and sample annotations with preliminary experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes