LGAINov 29, 2024

Proto Successor Measure: Representing the Behavior Space of an RL Agent

arXiv:2411.19418v217 citationsh-index: 13ICML
Originality Highly original
AI Analysis

This addresses the challenge of zero-shot learning in RL for general-purpose agents, offering a foundational approach to behavior representation without task-specific assumptions.

The paper tackles the problem of enabling reinforcement learning agents to perform zero-shot learning by transferring knowledge to new tasks without additional interactions, and presents Proto Successor Measure as a basis set for all possible agent behaviors, proving that any visitation distribution can be represented as an affine combination of these bases and deriving an algorithm that achieves optimal policies for any reward function without further environment interactions.

Having explored an environment, intelligent agents should be able to transfer their knowledge to most downstream tasks within that environment without additional interactions. Referred to as "zero-shot learning", this ability remains elusive for general-purpose reinforcement learning algorithms. While recent works have attempted to produce zero-shot RL agents, they make assumptions about the nature of the tasks or the structure of the MDP. We present Proto Successor Measure: the basis set for all possible behaviors of a Reinforcement Learning Agent in a dynamical system. We prove that any possible behavior (represented using visitation distributions) can be represented using an affine combination of these policy-independent basis functions. Given a reward function at test time, we simply need to find the right set of linear weights to combine these bases corresponding to the optimal policy. We derive a practical algorithm to learn these basis functions using reward-free interaction data from the environment and show that our approach can produce the optimal policy at test time for any given reward function without additional environmental interactions. Project page: https://agarwalsiddhant10.github.io/projects/psm.html.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes