LGAIIRMLMay 26, 2022

Active Labeling: Streaming Stochastic Gradients

arXiv:2205.13255v32 citationsh-index: 108
Originality Incremental advance
AI Analysis

This work addresses the need for less labeled data in machine learning training, offering a streaming solution that is incremental in improving efficiency for active learning tasks.

The paper tackles the problem of reducing supervision in stochastic gradient descent by introducing active labeling, a method that minimizes the generalization error per sample in a streaming setting, with demonstrated application to robust regression.

The workhorse of machine learning is stochastic gradient descent. To access stochastic gradients, it is common to consider iteratively input/output pairs of a training dataset. Interestingly, it appears that one does not need full supervision to access stochastic gradients, which is the main motivation of this paper. After formalizing the "active labeling" problem, which focuses on active learning with partial supervision, we provide a streaming technique that provably minimizes the ratio of generalization error over the number of samples. We illustrate our technique in depth for robust regression.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes