LGAINEFeb 7, 2022

Soft Actor-Critic with Inhibitory Networks for Faster Retraining

arXiv:2202.02918v2
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient model reuse in deep reinforcement learning when facing contradictory goals, though it appears to be an incremental improvement over existing soft actor-critic methods.

The paper tackles the problem of retraining deep reinforcement learning agents when new objectives conflict with previously learned skills, proposing an inhibitory network approach that enables separate adaptive state value evaluations and automatic entropy tuning. Experimental validation in OpenAI Gym environments demonstrates improved handling of the exploration-exploitation trade-off during retraining.

Reusing previously trained models is critical in deep reinforcement learning to speed up training of new agents. However, it is unclear how to acquire new skills when objectives and constraints are in conflict with previously learned skills. Moreover, when retraining, there is an intrinsic conflict between exploiting what has already been learned and exploring new skills. In soft actor-critic (SAC) methods, a temperature parameter can be dynamically adjusted to weight the action entropy and balance the explore $\times$ exploit trade-off. However, controlling a single coefficient can be challenging within the context of retraining, even more so when goals are contradictory. In this work, inspired by neuroscience research, we propose a novel approach using inhibitory networks to allow separate and adaptive state value evaluations, as well as distinct automatic entropy tuning. Ultimately, our approach allows for controlling inhibition to handle conflict between exploiting less risky, acquired behaviors and exploring novel ones to overcome more challenging tasks. We validate our method through experiments in OpenAI Gym environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes