LGMar 26, 2025

Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets

Alexander Levine, Peter Stone, Amy Zhang

arXiv:2503.21018v19.43 citationsh-index: 79

Originality Incremental advance

AI Analysis

This work addresses a theoretical limitation in offline action-free learning for environments with temporally-correlated noise, offering a foundation for practical methods in similar settings, though it is incremental as it builds on prior Ex-BMDP models.

The paper tackles the challenge of learning representations in high-dimensional sequential decision-making environments with uncontrollable noise features, specifically under the Exogenous Block MDP (Ex-BMDP) model, by introducing CRAFT, an algorithm that uses action-free video data from multiple agents with differing policies to achieve sample-efficient representation learning, with theoretical guarantees and demonstration on a toy example.

While sequential decision-making environments often involve high-dimensional observations, not all features of these observations are relevant for control. In particular, the observation space may capture factors of the environment which are not controllable by the agent, but which add complexity to the observation space. The need to ignore these "noise" features in order to operate in a tractably-small state space poses a challenge for efficient policy learning. Due to the abundance of video data available in many such environments, task-independent representation learning from action-free offline data offers an attractive solution. However, recent work has highlighted theoretical limitations in action-free learning under the Exogenous Block MDP (Ex-BMDP) model, where temporally-correlated noise features are present in the observations. To address these limitations, we identify a realistic setting where representation learning in Ex-BMDPs becomes tractable: when action-free video data from multiple agents with differing policies are available. Concretely, this paper introduces CRAFT (Comparison-based Representations from Action-Free Trajectories), a sample-efficient algorithm leveraging differences in controllable feature dynamics across agents to learn representations. We provide theoretical guarantees for CRAFT's performance and demonstrate its feasibility on a toy example, offering a foundation for practical methods in similar settings.

View on arXiv PDF

Similar