MLCLDBDCIRDec 27, 2016

Distributed Real-Time Sentiment Analysis for Big Data Social Streams

arXiv:1612.08543v130 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of real-time analytics for big data applications, offering a scalable solution for domains like social media monitoring, though it is incremental as it builds on existing distributed computing platforms and algorithms.

The paper tackles the challenge of performing real-time sentiment analysis on high-speed social data streams by developing Sentinel, a distributed system that achieves processing times under 10 milliseconds per instance and maintains over 90% accuracy on Twitter data.

Big data trend has enforced the data-centric systems to have continuous fast data streams. In recent years, real-time analytics on stream data has formed into a new research field, which aims to answer queries about what-is-happening-now with a negligible delay. The real challenge with real-time stream data processing is that it is impossible to store instances of data, and therefore online analytical algorithms are utilized. To perform real-time analytics, pre-processing of data should be performed in a way that only a short summary of stream is stored in main memory. In addition, due to high speed of arrival, average processing time for each instance of data should be in such a way that incoming instances are not lost without being captured. Lastly, the learner needs to provide high analytical accuracy measures. Sentinel is a distributed system written in Java that aims to solve this challenge by enforcing both the processing and learning process to be done in distributed form. Sentinel is built on top of Apache Storm, a distributed computing platform. Sentinels learner, Vertical Hoeffding Tree, is a parallel decision tree-learning algorithm based on the VFDT, with ability of enabling parallel classification in distributed environments. Sentinel also uses SpaceSaving to keep a summary of the data stream and stores its summary in a synopsis data structure. Application of Sentinel on Twitter Public Stream API is shown and the results are discussed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes