AISep 12, 2017

Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds

arXiv:1709.03969v239 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of inefficient training in complex 3D worlds for AI agents, though it is incremental as it builds on existing human-in-the-loop reinforcement learning methods.

The paper tackles the problem of slow training in deep reinforcement learning for 3D navigation by incorporating human feedback to manage exploration, exploitation, and listening strategies, resulting in improved training speed and performance in Minecraft environments, with robustness to inaccurate or absent feedback.

We describe a method to use discrete human feedback to enhance the performance of deep learning agents in virtual three-dimensional environments by extending deep-reinforcement learning to model the confidence and consistency of human feedback. This enables deep reinforcement learning algorithms to determine the most appropriate time to listen to the human feedback, exploit the current policy model, or explore the agent's environment. Managing the trade-off between these three strategies allows DRL agents to be robust to inconsistent or intermittent human feedback. Through experimentation using a synthetic oracle, we show that our technique improves the training speed and overall performance of deep reinforcement learning in navigating three-dimensional environments using Minecraft. We further show that our technique is robust to highly innacurate human feedback and can also operate when no human feedback is given.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes