Provable Data Scaling Law for Meta Learning via Complexity Minimization
This work provides a theoretical foundation for the empirical observation that larger pre-training data improves downstream sample efficiency in meta-learning, addressing a gap in existing theoretical frameworks.
The paper introduces a complexity minimization framework for meta-representation learning that provably captures the scaling behavior where downstream sample efficiency improves with increased pre-training data. The framework is shown to reduce error rates in few-shot adaptation as meta-training data grows, and empirical results demonstrate that adding complexity regularization to existing meta-learning methods consistently improves downstream sample efficiency.
Pre-training has become a fundamental paradigm in modern machine learning, with one of its key empirical benefits being reduced downstream sample complexity as the scale of pre-training data increases. However, existing theoretical frameworks for pre-training do not fully explain this phenomenon. In this paper, we introduce complexity minimization, a novel meta-representation learning framework designed to enable theoretical analysis of this scaling behavior, which learns representations by evaluating the downstream model complexity best suited to each domain and minimizing the worst-case such complexity across source domains. Our end-to-end theoretical analysis, spanning pre-training through downstream regression, shows that this framework provably captures this scaling behavior; in particular, we show that the error rate of few-shot adaptation improves as the amount of meta-training data grows. Empirically, we demonstrate that incorporating complexity regularization into existing meta-learning methods consistently improves downstream sample efficiency.