LGAIMLJul 26, 2019

On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

arXiv:1907.11788v14 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of hard exploration in multi-agent RL for researchers, though it is incremental as it builds on existing methods to improve safety in a specific benchmark.

The paper tackled the challenge of exploration in reinforcement learning within the Pommerman domain, which features sparse and deceptive rewards, by developing a model-based reasoning module that prunes dangerous actions, leading to significant improvements in learning performance.

How to best explore in domains with sparse, delayed, and deceptive rewards is an important open problem for reinforcement learning (RL). This paper considers one such domain, the recently-proposed multi-agent benchmark of Pommerman. This domain is very challenging for RL --- past work has shown that model-free RL algorithms fail to achieve significant learning without artificially reducing the environment's complexity. In this paper, we illuminate reasons behind this failure by providing a thorough analysis on the hardness of random exploration in Pommerman. While model-free random exploration is typically futile, we develop a model-based automatic reasoning module that can be used for safer exploration by pruning actions that will surely lead the agent to death. We empirically demonstrate that this module can significantly improve learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes