IR DBOct 27, 2012

Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin

arXiv:1210.7350v1114 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of real-time data processing for Twitter users seeking timely information, though it is an incremental case study rather than a fundamental breakthrough.

Twitter tackled the challenge of providing real-time related query suggestions and spelling corrections within minutes after breaking news events by developing a custom in-memory processing engine, replacing an initial Hadoop-based system that failed to meet latency requirements.

We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data.

View on arXiv PDF

Similar