LGMLDec 9, 2018

Theory of Curriculum Learning, with Convex Loss Functions

arXiv:1812.03472v151 citations
AI Analysis

This work provides foundational theoretical insights into curriculum learning for machine learning researchers, addressing a long-standing gap in understanding its mechanisms in convex settings.

The paper tackles the lack of theoretical analysis for curriculum learning by formulating an ideal difficulty score based on the loss of the optimal hypothesis and analyzing its impact on convergence rates in convex problems like linear regression and binary classification. It shows that the expected convergence rate decreases monotonically with this score, aligning with empirical results, and reconciles curriculum learning with hard data mining heuristics.

Curriculum Learning - the idea of teaching by gradually exposing the learner to examples in a meaningful order, from easy to hard, has been investigated in the context of machine learning long ago. Although methods based on this concept have been empirically shown to improve performance of several learning algorithms, no theoretical analysis has been provided even for simple cases. To address this shortfall, we start by formulating an ideal definition of difficulty score - the loss of the optimal hypothesis at a given datapoint. We analyze the possible contribution of curriculum learning based on this score in two convex problems - linear regression, and binary classification by hinge loss minimization. We show that in both cases, the expected convergence rate decreases monotonically with the ideal difficulty score, in accordance with earlier empirical results. We also prove that when the ideal difficulty score is fixed, the convergence rate is monotonically increasing with respect to the loss of the current hypothesis at each point. We discuss how these results bring to term two apparently contradicting heuristics: curriculum learning on the one hand, and hard data mining on the other.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes