LGRONov 18, 2025

$π^{*}_{0.6}$: a VLA That Learns From Experience

arXiv:2511.14759v2159 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of enabling robots to learn and adapt from experience in real-world settings, representing a significant but incremental advance in robotics and AI.

The paper tackles the problem of improving vision-language-action (VLA) models through real-world deployments using reinforcement learning, resulting in a method that more than doubles task throughput and halves failure rates on challenging tasks like folding laundry and making espresso.

We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demonstrations, data from on-policy collection, and expert teleoperated interventions provided during autonomous execution. RECAP starts by pre-training a generalist VLA with offline RL, which we call $π^{*}_{0.6}$, that can then be specialized to attain high performance on downstream tasks through on-robot data collection. We show that the $π^{*}_{0.6}$ model trained with the full RECAP method can fold laundry in real homes, reliably assemble boxes, and make espresso drinks using a professional espresso machine. On some of the hardest tasks, RECAP more than doubles task throughput and roughly halves the task failure rate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes