LGApr 16
$π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent CapabilitiesPhysical Intelligence, Bo Ai, Ali Amin et al. · mit
We present a new robotic foundation model, called $π_{0.7}$, that can enable strong out-of-the-box performance in a wide range of scenarios. $π_{0.7}$ can follow diverse language instructions in unseen environments, including multi-stage tasks with various kitchen appliances, provide zero-shot cross-embodiment generalization, for example enabling a robot to fold laundry without seeing the task before, and perform challenging tasks such as operating an espresso machine out of the box at a level of performance that matches much more specialized RL-finetuned models. The main idea behind $π_{0.7}$ is to use diverse context conditioning during training. This conditioning information, contained in the prompt, makes it possible to steer the model precisely to perform many tasks with different strategies. It is conditioned not just on a language command that describes what it should do, but on additional multimodal information that also describes the manner or strategy in which it should do it, including metadata about task performance and subgoal images. This enables $π_{0.7}$ to use very diverse data, including demonstrations, potentially suboptimal (autonomous) data including failures, and data from non-robot sources. Our experiments evaluate $π_{0.7}$ across numerous tasks with multiple robot platforms, on tasks that require speed and dexterity, language following, and compositional task generalization.
LGOct 31, 2024
$π_0$: A Vision-Language-Action Flow Model for General Robot ControlKevin Black, Noah Brown, Danny Driess et al.
Robot learning holds tremendous promise to unlock the full potential of flexible, general, and dexterous robot systems, as well as to address some of the deepest questions in artificial intelligence. However, bringing robot learning to the level of generality required for effective real-world systems faces major obstacles in terms of data, generalization, and robustness. In this paper, we discuss how generalist robot policies (i.e., robot foundation models) can address these challenges, and how we can design effective generalist robot policies for complex and highly dexterous tasks. We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We then discuss how this model can be trained on a large and diverse dataset from multiple dexterous robot platforms, including single-arm robots, dual-arm robots, and mobile manipulators. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people and from a high-level VLM policy, and its ability to acquire new skills via fine-tuning. Our results cover a wide variety of tasks, such as laundry folding, table cleaning, and assembling boxes.
LGApr 22, 2025
$π_{0.5}$: a Vision-Language-Action Model with Open-World GeneralizationPhysical Intelligence, Kevin Black, Noah Brown et al. · berkeley
In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains an open question how far such models can generalize in the wild. We describe $π_{0.5}$, a new model based on $π_{0}$ that uses co-training on heterogeneous tasks to enable broad generalization. $π_{0.5}$\ uses data from multiple robots, high-level semantic prediction, web data, and other sources to enable broadly generalizable real-world robotic manipulation. Our system uses a combination of co-training and hybrid multi-modal examples that combine image observations, language commands, object detections, semantic subtask prediction, and low-level actions. Our experiments show that this kind of knowledge transfer is essential for effective generalization, and we demonstrate for the first time that an end-to-end learning-enabled robotic system can perform long-horizon and dexterous manipulation skills, such as cleaning a kitchen or bedroom, in entirely new homes.
LGNov 18, 2025
$π^{*}_{0.6}$: a VLA That Learns From ExperiencePhysical Intelligence, Ali Amin, Raichelle Aniceto et al.
We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demonstrations, data from on-policy collection, and expert teleoperated interventions provided during autonomous execution. RECAP starts by pre-training a generalist VLA with offline RL, which we call $π^{*}_{0.6}$, that can then be specialized to attain high performance on downstream tasks through on-robot data collection. We show that the $π^{*}_{0.6}$ model trained with the full RECAP method can fold laundry in real homes, reliably assemble boxes, and make espresso drinks using a professional espresso machine. On some of the hardest tasks, RECAP more than doubles task throughput and roughly halves the task failure rate.
LGOct 7, 2019
Organization of machine learning based product development as per ISO 26262 and ISO/PAS 21448Krystian Radlak, Michał Szczepankiewicz, Tim Jones et al.
Machine learning (ML) algorithms generate a continuous stream of success stories from various domains and enable many novel applications in safety-critical systems. With the advent of autonomous driving, ML algorithms are being used in the automotive domain, where the applicable functional safety standard is ISO 26262. However, requirements and recommendations provided by ISO 26262 do not cover specific properties of machine learning algorithms. Therefore, specific aspects of ML (e.g., dataset requirements, performance evaluation metrics, lack of interpretability) must be addressed within some work products, which collect documentation resulting from one or more associated requirements and recommendations of ISO 26262. In this paper, we propose how key technical aspects and supporting processes related to development of ML-based systems can be organized according to ISO 26262 phases, sub-phases, and work products. We follow the same approach as in the ISO/PAS 21448 standard, which complements ISO 26262, in order to account for edge cases that can lead to hazards not directly caused by system failure.%, but resulting from functional insufficiencies of the intended functionality or by reasonably foreseeable misuse by persons.