IRDLMar 21, 2013

Taming the zoo - about algorithms implementation in the ecosystem of Apache Hadoop

arXiv:1303.5367v32 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for scalable data mining tools in scientific research, but it is incremental as it applies existing methods to a new domain.

The paper describes the Content Analysis System (CoAnSys), a framework for mining scientific publications using Apache Hadoop, implementing algorithms for classification, categorization, and citation matching to handle big data problems efficiently on Hadoop clusters.

Content Analysis System (CoAnSys) is a research framework for mining scientific publications using Apache Hadoop. This article describes the algorithms currently implemented in CoAnSys including classification, categorization and citation matching of scientific publications. The size of the input data classifies these algorithms in the range of big data problems, which can be efficiently solved on Hadoop clusters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes