LG AI MLJan 27, 2019

Modularization of End-to-End Learning: Case Study in Arcade Games

Andrew Melnik, Sascha Fleer, Malte Schilling, Helge Ritter

arXiv:1901.09895v16.012 citations

Originality Incremental advance

AI Analysis

This addresses the problem of slow learning and poor generalization in AI for gaming, though it is incremental as it builds on modular approaches rather than introducing a new paradigm.

The paper tackles the challenge of complex environments for end-to-end learning by decomposing arcade games into controllable and non-controllable objects, using specialized modules with regression, supervised learning, and reinforcement learning. It achieves human-level performance within 10-15 minutes of game time when a proper decomposition is provided.

Complex environments and tasks pose a difficult problem for holistic end-to-end learning approaches. Decomposition of an environment into interacting controllable and non-controllable objects allows supervised learning for non-controllable objects and universal value function approximator learning for controllable objects. Such decomposition should lead to a shorter learning time and better generalisation capability. Here, we consider arcade-game environments as sets of interacting objects (controllable, non-controllable) and propose a set of functional modules that are specialized on mastering different types of interactions in a broad range of environments. The modules utilize regression, supervised learning, and reinforcement learning algorithms. Results of this case study in different Atari games suggest that human-level performance can be achieved by a learning agent within a human amount of game experience (10-15 minutes game time) when a proper decomposition of an environment or a task is provided. However, automatization of such decomposition remains a challenging problem. This case study shows how a model of a causal structure underlying an environment or a task can benefit learning time and generalization capability of the agent, and argues in favor of exploiting modular structure in contrast to using pure end-to-end learning approaches.

View on arXiv PDF

Similar