LGDec 10, 2025

Latent Action World Models for Control with Unlabeled Trajectories

arXiv:2512.10016v1h-index: 44
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in offline reinforcement learning by enabling more efficient training with limited labeled data, though it is incremental as it builds on existing world model and RL techniques.

The paper tackles the problem of training world models when action labels are scarce by introducing latent-action world models that learn from both action-conditioned and action-free data, achieving strong performance on the DeepMind Control Suite with about an order of magnitude fewer action-labeled samples than baselines.

Inspired by how humans combine direct interaction with action-free experience (e.g., videos), we study world models that learn from heterogeneous data. Standard world models typically rely on action-conditioned trajectories, which limits effectiveness when action labels are scarce. We introduce a family of latent-action world models that jointly use action-conditioned and action-free data by learning a shared latent action representation. This latent space aligns observed control signals with actions inferred from passive observations, enabling a single dynamics model to train on large-scale unlabeled trajectories while requiring only a small set of action-labeled ones. We use the latent-action world model to learn a latent-action policy through offline reinforcement learning (RL), thereby bridging two traditionally separate domains: offline RL, which typically relies on action-conditioned data, and action-free training, which is rarely used with subsequent RL. On the DeepMind Control Suite, our approach achieves strong performance while using about an order of magnitude fewer action-labeled samples than purely action-conditioned baselines. These results show that latent actions enable training on both passive and interactive data, which makes world models learn more efficiently.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes