AIROApr 15, 2019

Improving interactive reinforcement learning: What makes a good teacher?

arXiv:1904.06879v134 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of optimizing trainer selection in interactive reinforcement learning for AI apprenticeship, though it is incremental as it builds on existing policy shaping methods.

The study investigated which characteristics of artificial agents make them effective trainers in interactive reinforcement learning, finding that a polymath agent outperforms a specialist agent by achieving higher rewards, faster convergence, and more stable behavior in learner-agents.

Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes