CLLGSINov 22, 2022

Time-Aware Datasets are Adaptive Knowledgebases for the New Normal

arXiv:2211.12508v12 citationsh-index: 61
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of detecting continuously evolving misinformation, especially for COVID-19, but is incremental as it focuses on dataset creation and simple time-aware methods.

The paper tackles the problem of language models being limited by static knowledge for evolving misinformation detection, and shows that incorporating time-awareness improves classifier accuracy, with COVID-TAD being a large-scale dataset spanning 25 months that is orders of magnitude bigger than related datasets.

Recent advances in text classification and knowledge capture in language models have relied on availability of large-scale text datasets. However, language models are trained on static snapshots of knowledge and are limited when that knowledge evolves. This is especially critical for misinformation detection, where new types of misinformation continuously appear, replacing old campaigns. We propose time-aware misinformation datasets to capture time-critical phenomena. In this paper, we first present evidence of evolving misinformation and show that incorporating even simple time-awareness significantly improves classifier accuracy. Second, we present COVID-TAD, a large-scale COVID-19 misinformation da-taset spanning 25 months. It is the first large-scale misinformation dataset that contains multiple snapshots of a datastream and is orders of magnitude bigger than related misinformation datasets. We describe the collection and labeling pro-cess, as well as preliminary experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes