DBLGMLDec 12, 2015

Active Sampler: Light-weight Accelerator for Complex Data Analytics at Scale

arXiv:1512.03880v111 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of scaling complex data analytics for practitioners by accelerating training with a light-weight, orthogonal method, though it is incremental as it builds on existing SGD-based approaches.

The paper tackles the problem of inefficient data sampling in iterative model training by proposing Active Sampler, an algorithm that prioritizes data with high learning value near classification boundaries, resulting in a 1.6-2.2x speedup for training SVM, feature selection, and deep learning models while maintaining comparable quality.

Recent years have witnessed amazing outcomes from "Big Models" trained by "Big Data". Most popular algorithms for model training are iterative. Due to the surging volumes of data, we can usually afford to process only a fraction of the training data in each iteration. Typically, the data are either uniformly sampled or sequentially accessed. In this paper, we study how the data access pattern can affect model training. We propose an Active Sampler algorithm, where training data with more "learning value" to the model are sampled more frequently. The goal is to focus training effort on valuable instances near the classification boundaries, rather than evident cases, noisy data or outliers. We show the correctness and optimality of Active Sampler in theory, and then develop a light-weight vectorized implementation. Active Sampler is orthogonal to most approaches optimizing the efficiency of large-scale data analytics, and can be applied to most analytics models trained by stochastic gradient descent (SGD) algorithm. Extensive experimental evaluations demonstrate that Active Sampler can speed up the training procedure of SVM, feature selection and deep learning, for comparable training quality by 1.6-2.2x.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes