CLFeb 3, 2025

Annotation Tool and Dataset for Fact-Checking Podcasts

arXiv:2502.01402v14.93 citationsh-index: 5WWW

Originality Incremental advance

AI Analysis

This addresses the problem of verifying unverified claims in diverse and multilingual podcast content for fact-checkers and researchers, representing an incremental improvement by integrating existing tools with novel annotation capabilities.

The authors tackled the challenge of fact-checking podcasts by developing a tool for real-time annotation during playback, which enabled the creation of a high-quality dataset used to fine-tune multilingual transformer models like XLM-RoBERTa for tasks such as claim detection and stance classification.

Podcasts are a popular medium on the web, featuring diverse and multilingual content that often includes unverified claims. Fact-checking podcasts is a challenging task, requiring transcription, annotation, and claim verification, all while preserving the contextual details of spoken content. Our tool offers a novel approach to tackle these challenges by enabling real-time annotation of podcasts during playback. This unique capability allows users to listen to the podcast and annotate key elements, such as check-worthy claims, claim spans, and contextual errors, simultaneously. By integrating advanced transcription models like OpenAI's Whisper and leveraging crowdsourced annotations, we create high-quality datasets to fine-tune multilingual transformer models such as XLM-RoBERTa for tasks like claim detection and stance classification. Furthermore, we release the annotated podcast transcripts and sample annotations with preliminary experiments.

View on arXiv PDF

Similar