SAGE: Streaming Agreement-Driven Gradient Sketches for Representative Subset Selection
This addresses the problem of efficient training for machine learning practitioners by offering a constant-memory alternative, though it is incremental as it builds on existing subset-selection and sketching techniques.
The paper tackles the computational and energy intensity of training neural networks on large datasets by introducing SAGE, a streaming data-subset selection method that uses gradient sketches to prioritize examples, resulting in reduced compute and memory usage while maintaining competitive accuracy across benchmarks.
Training modern neural networks on large datasets is computationally and energy intensive. We present SAGE, a streaming data-subset selection method that maintains a compact Frequent Directions (FD) sketch of gradient geometry in $O(\ell D)$ memory and prioritizes examples whose sketched gradients align with a consensus direction. The approach eliminates $N \times N$ pairwise similarities and explicit $N \times \ell$ gradient stores, yielding a simple two-pass, GPU-friendly pipeline. Leveraging FD's deterministic approximation guarantees, we analyze how agreement scoring preserves gradient energy within the principal sketched subspace. Across multiple benchmarks, SAGE trains with small kept-rate budgets while retaining competitive accuracy relative to full-data training and recent subset-selection baselines, and reduces end-to-end compute and peak memory. Overall, SAGE offers a practical, constant-memory alternative that complements pruning and model compression for efficient training.