LGDCDec 7, 2015

A Novel Approach to Distributed Multi-Class SVM

arXiv:1512.01993v16 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient distributed multi-class SVM algorithms for large-scale data processing, representing an incremental advancement in combining distributed computing with multi-class classification.

The paper tackles the problem of scaling multi-class SVM to large datasets by proposing a distributed algorithm using Hadoop, which reduces prediction time and achieves higher accuracy compared to traditional sequential methods as dataset size increases.

With data sizes constantly expanding, and with classical machine learning algorithms that analyze such data requiring larger and larger amounts of computation time and storage space, the need to distribute computation and memory requirements among several computers has become apparent. Although substantial work has been done in developing distributed binary SVM algorithms and multi-class SVM algorithms individually, the field of multi-class distributed SVMs remains largely unexplored. This research proposes a novel algorithm that implements the Support Vector Machine over a multi-class dataset and is efficient in a distributed environment (here, Hadoop). The idea is to divide the dataset into half recursively and thus compute the optimal Support Vector Machine for this half during the training phase, much like a divide and conquer approach. While testing, this structure has been effectively exploited to significantly reduce the prediction time. Our algorithm has shown better computation time during the prediction phase than the traditional sequential SVM methods (One vs. One, One vs. Rest) and out-performs them as the size of the dataset grows. This approach also classifies the data with higher accuracy than the traditional multi-class algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes