DCLGSep 14, 2020

Performance Evaluation of Linear Regression Algorithm in Cluster Environment

arXiv:2009.06497v14 citations
Originality Synthesis-oriented
AI Analysis

This work addresses performance scaling for data mining tasks in cluster environments, but it is incremental as it applies an existing method to a new dataset.

The paper evaluated the performance of linear regression for flight delay prediction in a cluster computing environment using Apache Spark, finding that a 5-node cluster improved computation performance by 39.76% compared to a standalone setup.

Cluster computing was introduced to replace the superiority of super computers. Cluster computing is able to overcome the problems that cannot be effectively dealt with supercomputers. In this paper, we are going to evaluate the performance of cluster computing by executing one of data mining techniques in the cluster environment. The experiment will attempt to predict the flight delay by using linear regression algorithm with apache spark as a framework for cluster computing. The result shows that, by involving 5 PCs in cluster environment with equal specifications can increase the performance of computation up to 39.76% compared to the standalone one. Attaching more nodes to the cluster can make the process become faster significantly.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes