LGMay 19, 2025

Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics

Maksim Bobrin, Ilya Zisman, Alexander Nikulin, Vladislav Kurenkov, Dmitry Dylov

arXiv:2505.13150v115.77 citationsh-index: 12Has Code

Originality Incremental advance

AI Analysis

This work addresses a critical limitation for deploying Behavioral Foundation Models in real-world settings like robotics, where dynamics can change unexpectedly, though it appears incremental as it builds on existing FB representation methods.

The paper tackles the problem of Behavioral Foundation Models failing to adapt to unseen or changing dynamics at test time, which limits their real-world applicability. The proposed method, which uses a transformer-based belief estimator and dynamics-specific clustering, achieves up to 2x higher zero-shot returns compared to baselines in discrete and continuous tasks.

Behavioral Foundation Models (BFMs) proved successful in producing policies for arbitrary tasks in a zero-shot manner, requiring no test-time training or task-specific fine-tuning. Among the most promising BFMs are the ones that estimate the successor measure learned in an unsupervised way from task-agnostic offline data. However, these methods fail to react to changes in the dynamics, making them inefficient under partial observability or when the transition function changes. This hinders the applicability of BFMs in a real-world setting, e.g., in robotics, where the dynamics can unexpectedly change at test time. In this work, we demonstrate that Forward-Backward (FB) representation, one of the methods from the BFM family, cannot distinguish between distinct dynamics, leading to an interference among the latent directions, which parametrize different policies. To address this, we propose a FB model with a transformer-based belief estimator, which greatly facilitates zero-shot adaptation. We also show that partitioning the policy encoding space into dynamics-specific clusters, aligned with the context-embedding directions, yields additional gain in performance. These traits allow our method to respond to the dynamics observed during training and to generalize to unseen ones. Empirically, in the changing dynamics setting, our approach achieves up to a 2x higher zero-shot returns compared to the baselines for both discrete and continuous tasks.

View on arXiv PDF Code

Similar