LGROJun 16, 2023

$\pi2\text{vec}$: Policy Representations with Successor Features

arXiv:2306.09800v22 citationsh-index: 30
AI Analysis

This work addresses the challenge of efficient policy selection in resource-constrained environments by integrating offline policy evaluation, foundation models, and policy representation, though it appears incremental as it combines existing research lines.

The paper tackles the problem of representing black box policy behaviors as feature vectors to enable offline policy selection, resulting in a method called π2vec that captures policy-induced changes in foundation model features without task-specific training.

This paper describes $\pi2\text{vec}$, a method for representing behaviors of black box policies as feature vectors. The policy representations capture how the statistics of foundation model features change in response to the policy behavior in a task agnostic way, and can be trained from offline data, allowing them to be used in offline policy selection. This work provides a key piece of a recipe for fusing together three modern lines of research: Offline policy evaluation as a counterpart to offline RL, foundation models as generic and powerful state representations, and efficient policy selection in resource constrained environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes