ML LGJun 2, 2020

Improved scalability under heavy tails, without strong convexity

arXiv:2006.01364v22.71 citations

Originality Highly original

AI Analysis

This addresses the challenge of scalable learning under heavy-tailed data for practitioners dealing with real-world outliers, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles the problem of machine learning with heavy-tailed losses and gradients without prior knowledge of data distribution, achieving improved dimension dependence in risk bounds and computational cost compared to existing robust gradient descent methods.

Real-world data is laden with outlying values. The challenge for machine learning is that the learner typically has no prior knowledge of whether the feedback it receives (losses, gradients, etc.) will be heavy-tailed or not. In this work, we study a simple algorithmic strategy that can be leveraged when both losses and gradients can be heavy-tailed. The core technique introduces a simple robust validation sub-routine, which is used to boost the confidence of inexpensive gradient-based sub-processes. Compared with recent robust gradient descent methods from the literature, dimension dependence (both risk bounds and cost) is substantially improved, without relying upon strong convexity or expensive per-step robustification. Empirically, we also show that under heavy-tailed losses, the proposed procedure cannot simply be replaced with naive cross-validation. Taken together, we have a scalable method with transparent guarantees, which performs well without prior knowledge of how "convenient" the feedback it receives will be.

View on arXiv PDF

Similar