LGFeb 4, 2022

Time-Constrained Learning

Sergio Filho, Eduardo Laber, Pedro Lazera, Marco Molinaro

arXiv:2202.01913v11.8Has Code

Originality Incremental advance

AI Analysis

This work addresses the practical challenge of efficient model training under time constraints, which is incremental as it builds on machine teaching principles to optimize dataset usage.

The paper tackles the problem of training a learner with a large labeled dataset under a strict time limit by proposing the Time-Constrained Learning Task (TCL) and an algorithm called TCT. In experiments with 5 learners and 20 datasets, TCT consistently outperformed existing methods, including a black-box teacher and random sampling, and showed provable guarantees with near-exponential improvements in some cases.

Consider a scenario in which we have a huge labeled dataset ${\cal D}$ and a limited time to train some given learner using ${\cal D}$. Since we may not be able to use the whole dataset, how should we proceed? Questions of this nature motivate the definition of the Time-Constrained Learning Task (TCL): Given a dataset ${\cal D}$ sampled from an unknown distribution $μ$, a learner ${\cal L}$ and a time limit $T$, the goal is to obtain in at most $T$ units of time the classification model with highest possible accuracy w.r.t. to $μ$, among those that can be built by ${\cal L}$ using the dataset ${\cal D}$. We propose TCT, an algorithm for the TCL task designed based that on principles from Machine Teaching. We present an experimental study involving 5 different Learners and 20 datasets where we show that TCT consistently outperforms two other algorithms: the first is a Teacher for black-box learners proposed in [Dasgupta et al., ICML 19] and the second is a natural adaptation of random sampling for the TCL setting. We also compare TCT with Stochastic Gradient Descent training -- our method is again consistently better. While our work is primarily practical, we also show that a stripped-down version of TCT has provable guarantees. Under reasonable assumptions, the time our algorithm takes to achieve a certain accuracy is never much bigger than the time it takes the batch teacher (which sends a single batch of examples) to achieve similar accuracy, and in some case it is almost exponentially better.

View on arXiv PDF Code

Similar