CVAIMar 22, 2023

$P^{3}O$: Transferring Visual Representations for Reinforcement Learning via Prompting

arXiv:2303.12371v22 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses the challenge of visual input variability in reinforcement learning for applications like gaming or robotics, but appears incremental as it builds on existing prompting and transfer learning methods.

The paper tackles the problem of transferring learned policies in deep reinforcement learning to new environments with different visual inputs by introducing $P^{3}O$, a three-stage algorithm using prompting, and shows it outperforms state-of-the-art visual transferring schemes on the OpenAI CarRacing game, being more effective than retraining policies.

It is important for deep reinforcement learning (DRL) algorithms to transfer their learned policies to new environments that have different visual inputs. In this paper, we introduce Prompt based Proximal Policy Optimization ($P^{3}O$), a three-stage DRL algorithm that transfers visual representations from a target to a source environment by applying prompting. The process of $P^{3}O$ consists of three stages: pre-training, prompting, and predicting. In particular, we specify a prompt-transformer for representation conversion and propose a two-step training process to train the prompt-transformer for the target environment, while the rest of the DRL pipeline remains unchanged. We implement $P^{3}O$ and evaluate it on the OpenAI CarRacing video game. The experimental results show that $P^{3}O$ outperforms the state-of-the-art visual transferring schemes. In particular, $P^{3}O$ allows the learned policies to perform well in environments with different visual inputs, which is much more effective than retraining the policies in these environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes