DCIRApr 16, 2012

Cloudpress 2.0: A MapReduce Approach for News Retrieval on the Cloud

arXiv:1204.3471v11.2
Originality Incremental advance
AI Analysis

This addresses the need for scalable news retrieval systems for users handling massive data, but it appears incremental as it builds on existing paradigms like MapReduce and cloud computing.

The paper tackles the problem of processing large volumes of news articles by proposing Cloudpress 2.0, a scalable and fault-tolerant news retrieval system that uses MapReduce and cloud computing, resulting in efficient and faster retrieval with features like query expansion and extractive summarization.

In this era of the Internet, the amount of news articles added every minute of everyday is humongous. As a result of this explosive amount of news articles, news retrieval systems are required to process the news articles frequently and intensively. The news retrieval systems that are in-use today are not capable of coping up with these data-intensive computations. Cloudpress 2.0 presented here, is designed and implemented to be scalable, robust and fault tolerant. It is designed in such a way that, all the processes involved in news retrieval such as fetching, pre-processing, indexing, storing and summarizing, exploit MapReduce paradigm and use the power of the Cloud computing. It uses novel approaches for parallel processing, for storing the news articles in a distributed database and for visualizing them as a 3D visual. It uses Lucene-based indexing for efficient and faster retrieval. It also includes a novel query expansion feature for searching the news articles. Cloudpress 2.0 also allows on-the-fly, extractive summarization of news articles based on the input query.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes