AILGOct 30, 2022

Reward Shaping Using Convolutional Neural Network

arXiv:2210.16956v15 citationsh-index: 37
Originality Incremental advance
AI Analysis

This work addresses the challenge of reward shaping in reinforcement learning for researchers and practitioners, offering an incremental improvement by automating transition matrix inference and enhancing learning performance.

The paper tackles the problem of improving reinforcement learning efficiency by proposing VIN-RS, a potential-based reward shaping method using a CNN to predict shaping values from environment images or graphs, which shows promising improvements in learning speed and maximum cumulative reward on tabular games, Atari 2600, and MuJoCo compared to state-of-the-art methods.

In this paper, we propose Value Iteration Network for Reward Shaping (VIN-RS), a potential-based reward shaping mechanism using Convolutional Neural Network (CNN). The proposed VIN-RS embeds a CNN trained on computed labels using the message passing mechanism of the Hidden Markov Model. The CNN processes images or graphs of the environment to predict the shaping values. Recent work on reward shaping still has limitations towards training on a representation of the Markov Decision Process (MDP) and building an estimate of the transition matrix. The advantage of VIN-RS is to construct an effective potential function from an estimated MDP while automatically inferring the environment transition matrix. The proposed VIN-RS estimates the transition matrix through a self-learned convolution filter while extracting environment details from the input frames or sampled graphs. Due to (1) the previous success of using message passing for reward shaping; and (2) the CNN planning behavior, we use these messages to train the CNN of VIN-RS. Experiments are performed on tabular games, Atari 2600 and MuJoCo, for discrete and continuous action space. Our results illustrate promising improvements in the learning speed and maximum cumulative reward compared to the state-of-the-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes