RO AIOct 31, 2025

EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities

Travis Davies, Yiqi Huang, Alexi Gladstone, Yunxin Liu, Xiang Chen, Heng Ji, Huxian Liu, Luhui Hu

arXiv:2510.27545v17.82 citationsh-index: 4

Originality Highly original

AI Analysis

This addresses robustness and efficiency issues in robotic policy learning, offering a scalable solution for real-world deployment with emergent capabilities like zero-shot recovery.

The paper tackles the computational inefficiency and instability of diffusion-based policies in robotics by introducing EBT-Policy, an energy-based architecture that reduces inference steps by up to 50x (e.g., from 100 to 2 steps) and outperforms diffusion methods in simulated and real-world tasks.

Implicit policies parameterized by generative models, such as Diffusion Policy, have become the standard for policy learning and Vision-Language-Action (VLA) models in robotics. However, these approaches often suffer from high computational cost, exposure bias, and unstable inference dynamics, which lead to divergence under distribution shifts. Energy-Based Models (EBMs) address these issues by learning energy landscapes end-to-end and modeling equilibrium dynamics, offering improved robustness and reduced exposure bias. Yet, policies parameterized by EBMs have historically struggled to scale effectively. Recent work on Energy-Based Transformers (EBTs) demonstrates the scalability of EBMs to high-dimensional spaces, but their potential for solving core challenges in physically embodied models remains underexplored. We introduce a new energy-based architecture, EBT-Policy, that solves core issues in robotic and real-world settings. Across simulated and real-world tasks, EBT-Policy consistently outperforms diffusion-based policies, while requiring less training and inference computation. Remarkably, on some tasks it converges within just two inference steps, a 50x reduction compared to Diffusion Policy's 100. Moreover, EBT-Policy exhibits emergent capabilities not seen in prior models, such as zero-shot recovery from failed action sequences using only behavior cloning and without explicit retry training. By leveraging its scalar energy for uncertainty-aware inference and dynamic compute allocation, EBT-Policy offers a promising path toward robust, generalizable robot behavior under distribution shifts.

View on arXiv PDF

Similar