cs.ROComputer Science

Robotics

Robot systems, control, planning, perception

100.0CVJun 1Code

Cosmos 3: Omnimodal World Models for Physical AI

Aditi, Niket Agarwal, Arslan Ali et al.

This work provides a scalable, general-purpose backbone for embodied agents by unifying multiple modalities into a single framework, which is a significant step for Physical AI research.

96.6LGApr 16

$π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities

Physical Intelligence, Bo Ai, Ali Amin et al. · mit

For roboticists, π0.7 provides a generalist model that reduces the need for task-specific fine-tuning, enabling broad applicability across platforms and tasks.

83.7CVApr 20

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

Jinghui Lu, Jiayi Guan, Zhijian Huang et al.

For autonomous driving systems requiring real-time decision-making, OneVL provides a method to achieve high accuracy without the latency overhead of autoregressive reasoning.

83.1ROApr 22

JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

Tianle Zhang, Zhihao Yuan, Dafeng Chi et al.

This addresses the challenge of insufficient data diversity and poor cross-embodiment generalization for robotic manipulation, representing a novel method rather than an incremental improvement.

77.4ROMay 28

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Qiuyue Wang, Mingsheng Li, Jian Guan et al.

This work addresses the fragmentation in embodied AI by proposing a unified model that generalizes across diverse tasks and robot platforms, reducing the need for specialized models.

73.5ROApr 30

World Model for Robot Learning: A Comprehensive Survey

Bohan Hou, Gen Li, Jindou Jia et al.

For researchers in robot learning, this survey organizes a rapidly growing but fragmented field, offering a unified perspective on world models and their applications.

72.6ROMay 18

WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform

Yu Shang, Yinzhou Tang, Yiding Ma et al.

This benchmark addresses the need for comprehensive evaluation of embodied world models, which is crucial for researchers developing multimodal, interactive, and real-world-capable AI agents.

71.5CVJun 2

NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation

Aarti Basant, Amlan Kar, Despoina Paschalidou et al. · nvidia

This work addresses the critical bottleneck of safe evaluation of autonomous driving policies in long-tail scenarios by providing a scalable, reactive simulation environment.

71.5CVMar 16

Kimodo: Scaling Controllable Human Motion Generation

Davis Rempe, Mathis Petrovich, Ye Yuan et al.

This addresses the need for scalable, high-quality human motion data for applications in robotics, simulation, and entertainment, representing a significant advancement over previous limited datasets.

71.0ROMay 12

World Action Models: The Next Frontier in Embodied AI

Siyin Wang, Junhao Shi, Zhaoyang Fu et al.

For researchers in embodied AI, this survey offers the first systematic framework to understand and compare WAM approaches, clarifying architectural trade-offs and future directions.

69.6ROMay 18Code

Dexora: Open-source VLA for High-DoF Bimanual Dexterity

Zongzheng Zhang, Jingrui Pang, Zhuo Yang et al.

It addresses the lack of open-source VLA systems for high-degree-of-freedom bimanual dexterous manipulation, enabling broader research in embodied AI.

69.4ROApr 7Code

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

Kaidong Zhang, Jian Zhang, Rongtao Xu et al.

This work addresses the problem of expensive real-time robot control for researchers and practitioners by providing a more efficient and transparent solution, though it is incremental in optimizing existing methods.

68.3ROApr 28

RISE: Self-Improving Robot Policy with Compositional World Model

Jiazhi Yang, Kunyang Lin, Jinwei Li et al.

For robotic manipulation, RISE enables safe and scalable reinforcement learning without physical interaction, significantly improving robustness in contact-rich tasks.

67.6ROMar 16Code

Ego to World: Collaborative Spatial Reasoning in Embodied Systems via Reinforcement Learning

Heng Zhou, Li Kang, Yiran Qin et al.

This addresses the problem of collaborative spatial reasoning for embodied AI systems, offering a principled foundation for learning world-centric scene understanding from ego-centric observations, though it appears incremental as it builds on existing methods like reinforcement learning and vision-language models.

67.4ROApr 1

SMASH: Mastering Scalable Whole-Body Skills for Humanoid Ping-Pong with Egocentric Vision

Junli Ren, Yinghui Li, Kai Zhang et al.

This work addresses the challenge of dynamic humanoid interaction tasks for robotics, advancing beyond prior systems that relied on external sensing and decoupled control.

67.3ROMay 4

MolmoAct2: Action Reasoning Models for Real-world Deployment

Haoquan Fang, Jiafei Duan, Donovan Clay et al.

For robotics researchers and practitioners, this work provides a fully open, high-performing VLA model with practical deployment considerations (latency, hardware cost), though it is an incremental improvement over existing VLA approaches.

67.2CVApr 2Code

UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving

Yongkang Li, Lijun Zhou, Sixu Yan et al.

It addresses a critical bottleneck in autonomous driving systems by improving model capabilities for perception and reasoning, though it appears incremental as it builds on existing VLA frameworks.

66.1ROApr 22

Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics

Open-H-Embodiment Consortium, Nigel Nelson, Juo-Tung Chen et al.

For medical robotics researchers, this dataset and models address the data bottleneck hindering foundation model development, providing critical infrastructure for robot learning and world modeling.

65.7ROApr 9

HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

Shuanghao Bai, Meng Li, Xinyuan Lv et al.

This addresses the problem of unstable and inefficient whole-body control for humanoid robots, enabling better manipulation in fast-reaction and long-horizon scenarios.

64.9LGApr 24

RL Token: Bootstrapping Online RL with Vision-Language-Action Models

Charles Xu, Jost Tobias Springenberg, Michael Equi et al.

For robotics practitioners, this enables rapid fine-tuning of large VLAs to achieve precise and fast manipulation skills with minimal real-world interaction.