IRDCFeb 10, 2018

Document Classification Using Distributed Machine Learning

arXiv:1802.03597v14 citations
Originality Synthesis-oriented
AI Analysis

This work addresses document classification for Turkish news, but it is incremental as it applies existing methods to a new dataset.

The paper tackled document classification for Turkish news using Naïve Bayes with distributed technologies like Hadoop and Spark, achieving unspecified performance results.

In this paper, we investigate the performance and success rates of Naïve Bayes Classification Algorithm for automatic classification of Turkish news into predetermined categories like economy, life, health etc. We use Apache Big Data technologies such as Hadoop, HDFS, Spark and Mahout, and apply these distributed technologies to Machine Learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes