LG AIOct 18, 2024

Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping

Kavinayan P. Sivakumar, Yan Zhang, Zachary Bell, Scott Nivison, Michael M. Zavlanos

arXiv:2410.14484v12.61 citationsh-index: 40

Originality Incremental advance

AI Analysis

This addresses a bottleneck in transfer learning for heterogeneous action spaces, reducing reliance on handcrafted mappings or policy sharing, though it is incremental in nature.

The paper tackles the problem of transferring reinforcement learning across agents with different action spaces by learning a subgoal mapping from expert demonstrations, which improves the learner's sample efficiency and training time on unseen tasks.

In this paper, we consider a transfer reinforcement learning problem involving agents with different action spaces. Specifically, for any new unseen task, the goal is to use a successful demonstration of this task by an expert agent in its action space to enable a learner agent learn an optimal policy in its own different action space with fewer samples than those required if the learner was learning on its own. Existing transfer learning methods across different action spaces either require handcrafted mappings between those action spaces provided by human experts, which can induce bias in the learning procedure, or require the expert agent to share its policy parameters with the learner agent, which does not generalize well to unseen tasks. In this work, we propose a method that learns a subgoal mapping between the expert agent policy and the learner agent policy. Since the expert agent and the learner agent have different action spaces, their optimal policies can have different subgoal trajectories. We learn this subgoal mapping by training a Long Short Term Memory (LSTM) network for a distribution of tasks and then use this mapping to predict the learner subgoal sequence for unseen tasks, thereby improving the speed of learning by biasing the agent's policy towards the predicted learner subgoal sequence. Through numerical experiments, we demonstrate that the proposed learning scheme can effectively find the subgoal mapping underlying the given distribution of tasks. Moreover, letting the learner agent imitate the expert agent's policy with the learnt subgoal mapping can significantly improve the sample efficiency and training time of the learner agent in unseen new tasks.

View on arXiv PDF

Similar