Back to Explore
cs.ROComputer Science

Robotics

Robot systems, control, planning, perception

100.0CVJun 1Code
Cosmos 3: Omnimodal World Models for Physical AI

Aditi, Niket Agarwal, Arslan Ali et al.

This work provides a scalable, general-purpose backbone for embodied agents by unifying multiple modalities into a single framework, which is a significant step for Physical AI research.

83.1ROApr 22
JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

Tianle Zhang, Zhihao Yuan, Dafeng Chi et al.

This addresses the challenge of insufficient data diversity and poor cross-embodiment generalization for robotic manipulation, representing a novel method rather than an incremental improvement.

71.5CVMar 16
Kimodo: Scaling Controllable Human Motion Generation

Davis Rempe, Mathis Petrovich, Ye Yuan et al.

This addresses the need for scalable, high-quality human motion data for applications in robotics, simulation, and entertainment, representing a significant advancement over previous limited datasets.

71.0ROMay 12
World Action Models: The Next Frontier in Embodied AI

Siyin Wang, Junhao Shi, Zhaoyang Fu et al.

For researchers in embodied AI, this survey offers the first systematic framework to understand and compare WAM approaches, clarifying architectural trade-offs and future directions.

67.6ROMar 16Code
Ego to World: Collaborative Spatial Reasoning in Embodied Systems via Reinforcement Learning

Heng Zhou, Li Kang, Yiran Qin et al.

This addresses the problem of collaborative spatial reasoning for embodied AI systems, offering a principled foundation for learning world-centric scene understanding from ego-centric observations, though it appears incremental as it builds on existing methods like reinforcement learning and vision-language models.

67.3ROMay 4
MolmoAct2: Action Reasoning Models for Real-world Deployment

Haoquan Fang, Jiafei Duan, Donovan Clay et al.

For robotics researchers and practitioners, this work provides a fully open, high-performing VLA model with practical deployment considerations (latency, hardware cost), though it is an incremental improvement over existing VLA approaches.