99.1CVJun 1Code
Cosmos 3: Omnimodal World Models for Physical AIAditi, Niket Agarwal, Arslan Ali et al.
We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI -- effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 https://openmdw.ai/license/1-1/ License at https://github.com/nvidia/cosmos}{github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3 . The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3 .
91.5ROApr 14
RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist PoliciesXuning Yang, Rishit Dagli, Alex Zook et al. · nvidia
The pursuit of general-purpose robotics has yielded impressive foundation models, yet simulation-based benchmarking remains a bottleneck due to rapid performance saturation and a lack of true generalization testing. Existing benchmarks often exhibit significant domain overlap between training and evaluation, trivializing success rates and obscuring insights into robustness. We introduce RoboLab, a simulation benchmarking framework designed to address these challenges. Concretely, our framework is designed to answer two questions: (1) to what extent can we understand the performance of a real-world policy by analyzing its behavior in simulation, and (2) which external factors most strongly affect that behavior under controlled perturbations. First, RoboLab enables human-authored and LLM-enabled generation of scenes and tasks in a robot- and policy-agnostic manner within a physically realistic and photorealistic simulation. With this, we propose the RoboLab-120 benchmark, consisting of 120 tasks categorized into three competency axes: visual, procedural, relational competency, across three difficulty levels. Second, we introduce a systematic analysis of real-world policies that quantify both their performance and the sensitivity of their behavior to controlled perturbations, indicating that high-fidelity simulation can serve as a proxy for analyzing performance and its dependence on external factors. Evaluation with RoboLab exposes significant performance gap in current state-of-the-art models. By providing granular metrics and a scalable toolset, RoboLab offers a scalable framework for evaluating the true generalization capabilities of task-generalist robotic policies.
ROOct 15, 2025
VLA-0: Building State-of-the-Art VLAs with Zero ModificationAnkit Goyal, Hugo Hadfield, Xuning Yang et al. · nvidia
Vision-Language-Action models (VLAs) hold immense promise for enabling generalist robot manipulation. However, the best way to build them remains an open question. Current approaches often add complexity, such as modifying the existing vocabulary of a Vision-Language Model (VLM) with action tokens or introducing special action heads. Curiously, the simplest strategy of representing actions directly as text has remained largely unexplored. This work introduces VLA-0 to investigate this idea. We find that VLA-0 is not only effective; it is surprisingly powerful. With the right design, VLA-0 outperforms more involved models. On LIBERO, a popular benchmark for evaluating VLAs, VLA-0 outperforms all existing methods trained on the same robotic data, including $π_0.5$-KI, OpenVLA-OFT and SmolVLA. Furthermore, without large-scale robotics-specific training, it outperforms methods trained on large-scale robotic data, like $π_0.5$-KI, $π_0$, GR00T-N1 and MolmoAct. These findings also translate to the real world, where VLA-0 outperforms SmolVLA, a VLA model pre-trained on large-scale real data. This paper summarizes our unexpected findings and spells out the specific techniques required to unlock the high performance of this simple yet potent VLA design. Visual results, code, and trained models are provided here: https://vla0.github.io/.
ROSep 26, 2021
Singularities of serial robots: Identification and distance computation using geometric algebraIsiah Zaplana, Hugo Hadfield, Joan Lasenby
The singularities of serial robotic manipulators are those configurations in which the robot loses the ability to move in at least one direction. Hence, their identification is fundamental to enhance the performance of current control and motion planning strategies. While classical approaches entail the computation of the determinant of either a 6x n or nxn matrix for an n degrees of freedom serial robot, this work addresses a novel singularity identification method based on modelling the twists defined by the joint axes of the robot as vectors of the six-dimensional and three-dimensional geometric algebras. In particular, it consists of identifying which configurations cause the exterior product of these twists to vanish. In addition, since rotors represent rotations in geometric algebra, once these singularities have been identified, a distance function is defined in the configuration space C such that its restriction to the set of singular configurations S allows us to compute the distance of any configuration to a given singularity. This distance function is used to enhance how the singularities are handled in three different scenarios, namely motion planning, motion control and bilateral teleoperation.
ROSep 25, 2021
Closed-form solutions for the inverse kinematics of serial robots using conformal geometric algebraIsiah Zaplana, Hugo Hadfield, Joan Lasenby
This work addresses the inverse kinematics of serial robots using conformal geometric algebra. Classical approaches include either the use of homogeneous matrices, which entails high computational cost and execution time or the development of particular geometric strategies that cannot be generalized to arbitrary serial robots. In this work, we present a compact, elegant and intuitive formulation of robot kinematics based on conformal geometric algebra that provides a suitable framework for the closed-form resolution of the inverse kinematic problem for manipulators with a spherical wrist. For serial robots of this kind, the inverse kinematics problem can be split in two subproblems: the position and orientation problems. The latter is solved by appropriately splitting the rotor that defines the target orientation into three simpler rotors, while the former is solved by developing a geometric strategy for each combination of prismatic and revolute joints that forms the position part of the robot. Finally, the inverse kinematics of 7 DoF redundant manipulators with a spherical wrist is solved by extending the geometric solutions obtained in the non-redundant case.