LGDec 5, 2024

Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting

arXiv:2412.04368v112 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses the problem of improving zero-shot policy adaptation in reinforcement learning for researchers and practitioners, though it is incremental as it builds on existing FB frameworks.

The authors tackled limitations in training behavior foundation models (BFMs) for zero-shot policy adaptation in reinforcement learning by introducing auto-regressive features to break linear task encoding constraints and integrating offline RL techniques to improve performance. They achieved results where the generic FB agent matched single-task offline agents like IQL and XQL on the D4RL locomotion benchmark, with auto-regressive features showing moderate benefits in tasks requiring spatial precision and generalization.

The forward-backward representation (FB) is a recently proposed framework (Touati et al., 2023; Touati & Ollivier, 2021) to train behavior foundation models (BFMs) that aim at providing zero-shot efficient policies for any new task specified in a given reinforcement learning (RL) environment, without training for each new task. Here we address two core limitations of FB model training. First, FB, like all successor-feature-based methods, relies on a linear encoding of tasks: at test time, each new reward function is linearly projected onto a fixed set of pre-trained features. This limits expressivity as well as precision of the task representation. We break the linearity limitation by introducing auto-regressive features for FB, which let finegrained task features depend on coarser-grained task information. This can represent arbitrary nonlinear task encodings, thus significantly increasing expressivity of the FB framework. Second, it is well-known that training RL agents from offline datasets often requires specific techniques.We show that FB works well together with such offline RL techniques, by adapting techniques from (Nair et al.,2020b; Cetin et al., 2024) for FB. This is necessary to get non-flatlining performance in some datasets, such as DMC Humanoid. As a result, we produce efficient FB BFMs for a number of new environments. Notably, in the D4RL locomotion benchmark, the generic FB agent matches the performance of standard single-task offline agents (IQL, XQL). In many setups, the offline techniques are needed to get any decent performance at all. The auto-regressive features have a positive but moderate impact, concentrated on tasks requiring spatial precision and task generalization beyond the behaviors represented in the trainset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes