IRSIApr 22, 2017

Distant Supervision for Topic Classification of Tweets in Curated Streams

arXiv:1704.06726v14 citations
Originality Synthesis-oriented
AI Analysis

This provides a low-cost solution for news outlets and organizations to categorize tweets in dynamic social media streams, though it is incremental in applying existing distant supervision methods to this specific domain.

The paper tackled the problem of topic classification for tweets in curated streams by using distant supervision from topically-focused streams to train classifiers, achieving good performance that adapts to topic drift without manual labeling.

We tackle the challenge of topic classification of tweets in the context of analyzing a large collection of curated streams by news outlets and other organizations to deliver relevant content to users. Our approach is novel in applying distant supervision based on semi-automatically identifying curated streams that are topically focused (for example, on politics, entertainment, or sports). These streams provide a source of labeled data to train topic classifiers that can then be applied to categorize tweets from more topically-diffuse streams. Experiments on both noisy labels and human ground-truth judgments demonstrate that our approach yields good topic classifiers essentially "for free", and that topic classifiers trained in this manner are able to dynamically adjust for topic drift as news on Twitter evolves.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes