DCIRSIDec 4, 2018

Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm

arXiv:1812.01141v16 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient, real-time tweet analytics for social media platforms, though it is incremental as it builds on existing Naïve Bayes models with distributed processing.

The paper tackles the challenge of real-time tweet topic classification using hashtags by proposing a distributed online approach implemented on Apache Storm, achieving up to 97% accuracy and a 37% increase in throughput on eight processors.

Twitter is a popular social network platform where users can interact and post texts of up to 280 characters called tweets. Hashtags, hyperlinked words in tweets, have increasingly become crucial for tweet retrieval and search. Using hashtags for tweet topic classification is a challenging problem because of context dependent among words, slangs, abbreviation and emoticons in a short tweet along with evolving use of hashtags. Since Twitter generates millions of tweets daily, tweet analytics is a fundamental problem of Big data stream that often requires a real-time Distributed processing. This paper proposes a distributed online approach to tweet topic classification with hashtags. Being implemented on Apache Storm, a distributed real time framework, our approach incrementally identifies and updates a set of strong predictors in the Naïve Bayes model for classifying each incoming tweet instance. Preliminary experiments show promising results with up to 97% accuracy and 37% increase in throughput on eight processors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes