On the Statistical Benefits of Curriculum Learning
This work provides foundational theoretical insights into curriculum learning, addressing a key gap for researchers in machine learning optimization.
The paper tackles the lack of theoretical understanding of curriculum learning's benefits by analyzing it in multitask linear regression, deriving minimax rates for oracle and adaptive settings, and showing that adaptive learning is significantly harder in unstructured settings but only slightly so in structured ones.
Curriculum learning (CL) is a commonly used machine learning training strategy. However, we still lack a clear theoretical understanding of CL's benefits. In this paper, we study the benefits of CL in the multitask linear regression problem under both structured and unstructured settings. For both settings, we derive the minimax rates for CL with the oracle that provides the optimal curriculum and without the oracle, where the agent has to adaptively learn a good curriculum. Our results reveal that adaptive learning can be fundamentally harder than the oracle learning in the unstructured setting, but it merely introduces a small extra term in the structured setting. To connect theory with practice, we provide justification for a popular empirical method that selects tasks with highest local prediction gain by comparing its guarantees with the minimax rates mentioned above.