IR DCFeb 10, 2018

Document Classification Using Distributed Machine Learning

arXiv:1802.03597v14 citations

Originality Synthesis-oriented

AI Analysis

This work addresses document classification for Turkish news, but it is incremental as it applies existing methods to a new dataset.

The paper tackled document classification for Turkish news using Naïve Bayes with distributed technologies like Hadoop and Spark, achieving unspecified performance results.

In this paper, we investigate the performance and success rates of Naïve Bayes Classification Algorithm for automatic classification of Turkish news into predetermined categories like economy, life, health etc. We use Apache Big Data technologies such as Hadoop, HDFS, Spark and Mahout, and apply these distributed technologies to Machine Learning.

View on arXiv PDF

Similar