LGMLJul 22, 2023

The Sample Complexity of Multi-Distribution Learning for VC Classes

arXiv:2307.12135v1
Originality Synthesis-oriented
AI Analysis

This work addresses a theoretical gap in multi-distribution learning, which is incremental progress for the machine learning theory community.

The paper tackles the sample complexity gap in multi-distribution learning for VC classes, establishing an upper bound of O(ε^{-2} ln(k)(d + k) + min{ε^{-1} dk, ε^{-4} ln(k) d}) and a lower bound of Ω(ε^{-2}(d + k ln(k))).

Multi-distribution learning is a natural generalization of PAC learning to settings with multiple data distributions. There remains a significant gap between the known upper and lower bounds for PAC-learnable classes. In particular, though we understand the sample complexity of learning a VC dimension d class on $k$ distributions to be $O(ε^{-2} \ln(k)(d + k) + \min\{ε^{-1} dk, ε^{-4} \ln(k) d\})$, the best lower bound is $Ω(ε^{-2}(d + k \ln(k)))$. We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes