LGAINov 12, 2023

An advantage based policy transfer algorithm for reinforcement learning with measures of transferability

arXiv:2311.06731v2h-index: 6
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving learning speed and performance in reinforcement learning for real-world applications where interactions are limited, though it appears incremental by building on existing transfer RL frameworks.

The paper tackles the problem of sample inefficiency and heuristic design in transfer reinforcement learning by proposing APT-RL, an off-policy algorithm that uses advantage as a regularizer to weigh transferred knowledge, and it outperforms existing methods in high-dimensional continuous control tasks, including being at least as good as learning from scratch in adversarial scenarios.

Reinforcement learning (RL) enables sequential decision-making in complex and high-dimensional environments through interaction with the environment. In most real-world applications, however, a high number of interactions are infeasible. In these environments, transfer RL algorithms, which can be used for the transfer of knowledge from one or multiple source environments to a target environment, have been shown to increase learning speed and improve initial and asymptotic performance. However, most existing transfer RL algorithms are on-policy and sample inefficient, fail in adversarial target tasks, and often require heuristic choices in algorithm design. This paper proposes an off-policy Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain environments. Its novelty is in using the popular notion of ``advantage'' as a regularizer, to weigh the knowledge that should be transferred from the source, relative to new knowledge learned in the target, removing the need for heuristic choices. Further, we propose a new transfer performance measure to evaluate the performance of our algorithm and unify existing transfer RL frameworks. Finally, we present a scalable, theoretically-backed task similarity measurement algorithm to illustrate the alignments between our proposed transferability measure and similarities between source and target environments. We compare APT-RL with several baselines, including existing transfer-RL algorithms, in three high-dimensional continuous control tasks. Our experiments demonstrate that APT-RL outperforms existing transfer RL algorithms and is at least as good as learning from scratch in adversarial tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes