LGAIDSMLMar 7, 2018

Sever: A Robust Meta-Algorithm for Stochastic Optimization

arXiv:1803.02815v2316 citations
Originality Incremental advance
AI Analysis

This addresses the problem of outlier sensitivity in high-dimensional machine learning for practitioners, offering a scalable solution with strong theoretical guarantees, though it appears incremental as it builds on existing base learners.

The paper tackles the problem of machine learning methods being brittle to structured outliers in high dimensions by introducing Sever, a meta-algorithm that hardens base learners like least squares or stochastic gradient descent to be resistant to outliers, resulting in substantially greater robustness on datasets such as spam classification and drug design, with concrete improvements like reducing test error from 13.4%-20.5% to 7.4% with 1% corruptions on spam data.

In high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers. To address this, we introduce a new meta-algorithm that can take in a base learner such as least squares or stochastic gradient descent, and harden the learner to be resistant to outliers. Our method, Sever, possesses strong theoretical guarantees yet is also highly scalable -- beyond running the base learner itself, it only requires computing the top singular vector of a certain $n \times d$ matrix. We apply Sever on a drug design dataset and a spam classification dataset, and find that in both cases it has substantially greater robustness than several baselines. On the spam dataset, with $1\%$ corruptions, we achieved $7.4\%$ test error, compared to $13.4\%-20.5\%$ for the baselines, and $3\%$ error on the uncorrupted dataset. Similarly, on the drug design dataset, with $10\%$ corruptions, we achieved $1.42$ mean-squared error test error, compared to $1.51$-$2.33$ for the baselines, and $1.23$ error on the uncorrupted dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes