LGMay 28, 2023

On the Value of Myopic Behavior in Policy Reuse

Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li

arXiv:2305.17623v12.0

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently leveraging prior knowledge in reinforcement learning for tasks that are difficult to learn from scratch, offering a method that enhances policy reuse in domains like robotics.

The paper tackles the problem of reusing learned policies in reinforcement learning for unfamiliar tasks by introducing the SMEC framework, which selectively aggregates short-term behaviors from prior policies with long-term behaviors of the task policy, resulting in improved performance on manipulation and locomotion tasks compared to existing methods.

Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence. In reinforcement learning, rationally reusing the policies acquired from other tasks or human experts is critical for tackling problems that are difficult to learn from scratch. In this work, we present a framework called Selective Myopic bEhavior Control~(SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks. By evaluating the behaviors of prior policies via a hybrid value function architecture, SMEC adaptively aggregates the sharable short-term behaviors of prior policies and the long-term behaviors of the task policy, leading to coordinated decisions. Empirical results on a collection of manipulation and locomotion tasks demonstrate that SMEC outperforms existing methods, and validate the ability of SMEC to leverage related prior policies.

View on arXiv PDF

Similar