LGCYOct 22, 2022

On-Demand Sampling: Learning Optimally from Multiple Distributions

arXiv:2210.12529v349 citationsh-index: 25
Originality Incremental advance
AI Analysis

This addresses efficiency challenges in fairness, robustness, and collaborative settings for machine learning practitioners, offering incremental algorithmic improvements.

The paper tackles the problem of multi-distribution learning, where a learner aims to minimize expected loss uniformly across multiple data distributions with minimal samples, establishing optimal sample complexity bounds that exceed single-distribution learning by only an additive factor of n log(n)/ε², improving upon prior works by multiplicative factors.

Social and real-world considerations such as robustness, fairness, social welfare and multi-agent tradeoffs have given rise to multi-distribution learning paradigms, such as collaborative learning, group distributionally robust optimization, and fair federated learning. In each of these settings, a learner seeks to uniformly minimize its expected loss over $n$ predefined data distributions, while using as few samples as possible. In this paper, we establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity. Importantly, our sample complexity bounds for multi-distribution learning exceed that of learning a single distribution by only an additive factor of $n \log(n) / ε^2$. This improves upon the best known sample complexity bounds for fair federated learning by Mohri et al. and collaborative learning by Nguyen and Zakynthinou by multiplicative factors of $n$ and $\log(n)/ε^3$, respectively. We also provide the first sample complexity bounds for the group DRO objective of Sagawa et al. To guarantee these optimal sample complexity bounds, our algorithms learn to sample from data distributions on demand. Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving stochastic zero-sum games. In particular, we contribute stochastic variants of no-regret dynamics that can trade off between players' differing sampling costs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes