AI ROApr 15, 2019

Improving interactive reinforcement learning: What makes a good teacher?

Francisco Cruz, Sven Magg, Yukie Nagai, Stefan Wermter

arXiv:1904.06879v117.134 citationsh-index: 43

Originality Incremental advance

AI Analysis

This work addresses the problem of optimizing trainer selection in interactive reinforcement learning for AI apprenticeship, though it is incremental as it builds on existing policy shaping methods.

The study investigated which characteristics of artificial agents make them effective trainers in interactive reinforcement learning, finding that a polymath agent outperforms a specialist agent by achieving higher rewards, faster convergence, and more stable behavior in learner-agents.

Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.

View on arXiv PDF

Similar