LGApr 25, 2022

Reinforcement Teaching

Calarina Muslimani, Alex Lewandowski, Dale Schuurmans, Matthew E. Taylor, Jun Luo

arXiv:2204.11897v33.32 citationsh-index: 35

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving general learning algorithms for researchers and practitioners in machine learning, though it appears incremental as it builds on existing meta-learning concepts.

The authors tackled the problem of meta-learning methods being limited to specific components or differentiable algorithms by developing Reinforcement Teaching, a framework that learns a teaching policy to improve any learning algorithm, and demonstrated its effectiveness by significantly outperforming previous methods in both reinforcement and supervised learning experiments.

Machine learning algorithms learn to solve a task, but are unable to improve their ability to learn. Meta-learning methods learn about machine learning algorithms and improve them so that they learn more quickly. However, existing meta-learning methods are either hand-crafted to improve one specific component of an algorithm or only work with differentiable algorithms. We develop a unifying meta-learning framework, called Reinforcement Teaching, to improve the learning process of \emph{any} algorithm. Under Reinforcement Teaching, a teaching policy is learned, through reinforcement, to improve a student's learning algorithm. To learn an effective teaching policy, we introduce the parametric-behavior embedder that learns a representation of the student's learnable parameters from its input/output behavior. We further use learning progress to shape the teacher's reward, allowing it to more quickly maximize the student's performance. To demonstrate the generality of Reinforcement Teaching, we conduct experiments in which a teacher learns to significantly improve both reinforcement and supervised learning algorithms. Reinforcement Teaching outperforms previous work using heuristic reward functions and state representations, as well as other parameter representations.

View on arXiv PDF

Similar