LGJan 27, 2023

Meta-Learning Mini-Batch Risk Functionals

arXiv:2301.11724v1h-index: 58
Originality Incremental advance
AI Analysis

This addresses a bottleneck in deep learning for practitioners needing to optimize beyond expected loss, though it is incremental as it builds on existing meta-learning and risk functional concepts.

The paper tackles the problem of optimizing non-standard risk functionals like Conditional Value at Risk in deep learning with mini-batch gradient descent, where unbiased estimators are lacking, by introducing a meta-learning method that learns interpretable mini-batch risk functionals during training, achieving up to 10% risk reduction over hand-engineered ones and 14% relative improvement in unknown settings.

Supervised learning typically optimizes the expected value risk functional of the loss, but in many cases, we want to optimize for other risk functionals. In full-batch gradient descent, this is done by taking gradients of a risk functional of interest, such as the Conditional Value at Risk (CVaR) which ignores some quantile of extreme losses. However, deep learning must almost always use mini-batch gradient descent, and lack of unbiased estimators of various risk functionals make the right optimization procedure unclear. In this work, we introduce a meta-learning-based method of learning an interpretable mini-batch risk functional during model training, in a single shot. When optimizing for various risk functionals, the learned mini-batch risk functions lead to risk reduction of up to 10% over hand-engineered mini-batch risk functionals. Then in a setting where the right risk functional is unknown a priori, our method improves over baseline by 14% relative (~9% absolute). We analyze the learned mini-batch risk functionals at different points through training, and find that they learn a curriculum (including warm-up periods), and that their final form can be surprisingly different from the underlying risk functional that they optimize for.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes