Reinforcement Learning with a Focus on Adjusting Policies to Reach Targets
This work addresses exploration efficiency for reinforcement learning agents in practical applications, though it appears incremental.
The paper tackles the problem of high exploration costs in reinforcement learning by proposing a method that prioritizes achieving an aspiration level over maximizing expected return, resulting in returns equal to or greater than standard methods in motion control and navigation tasks.
The objective of a reinforcement learning agent is to discover better actions through exploration. However, typical exploration techniques aim to maximize rewards, often incurring high costs in both exploration and learning processes. We propose a novel deep reinforcement learning method, which prioritizes achieving an aspiration level over maximizing expected return. This method flexibly adjusts the degree of exploration based on the proportion of target achievement. Through experiments on a motion control task and a navigation task, this method achieved returns equal to or greater than other standard methods. The results of the analysis showed two things: our method flexibly adjusts the exploration scope, and it has the potential to enable the agent to adapt to non-stationary environments. These findings indicated that this method may have effectiveness in improving exploration efficiency in practical applications of reinforcement learning.