LGApr 13, 2016

Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer

arXiv:1604.03986v143 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of theoretical grounding for policy advice in reinforcement learning, which is incremental as it builds on existing transfer learning methods by adding formal analysis and quantification of negative transfer.

The paper tackles the lack of theoretical analysis in reinforcement learning transfer methods by formalizing a setting where multiple teachers provide advice to a student, introducing an algorithm that combines exploration and advice, and deriving regret bounds that show good teachers help and bad teachers hurt, while quantifying negative transfer for the first time in this context.

Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher's advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes