LG MLJul 22, 2023

The Sample Complexity of Multi-Distribution Learning for VC Classes

Pranjal Awasthi, Nika Haghtalab, Eric Zhao

arXiv:2307.12135v12.0

Originality Synthesis-oriented

AI Analysis

This work addresses a theoretical gap in multi-distribution learning, which is incremental progress for the machine learning theory community.

The paper tackles the sample complexity gap in multi-distribution learning for VC classes, establishing an upper bound of O(ε^{-2} ln(k)(d + k) + min{ε^{-1} dk, ε^{-4} ln(k) d}) and a lower bound of Ω(ε^{-2}(d + k ln(k))).

Multi-distribution learning is a natural generalization of PAC learning to settings with multiple data distributions. There remains a significant gap between the known upper and lower bounds for PAC-learnable classes. In particular, though we understand the sample complexity of learning a VC dimension d class on $k$ distributions to be $O(ε^{-2} \ln(k)(d + k) + \min\{ε^{-1} dk, ε^{-4} \ln(k) d\})$, the best lower bound is $Ω(ε^{-2}(d + k \ln(k)))$. We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.

View on arXiv PDF

Similar