99.7ROApr 22
Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical RoboticsOpen-H-Embodiment Consortium, Nigel Nelson, Juo-Tung Chen et al.
Autonomous medical robots hold promise to improve patient outcomes, reduce provider workload, democratize access to care, and enable superhuman precision. However, autonomous medical robotics has been limited by a fundamental data problem: existing medical robotic datasets are small, single-embodiment, and rarely shared openly, restricting the development of foundation models that the field needs to advance. We introduce Open-H-Embodiment, the largest open dataset of medical robotic video with synchronized kinematics to date, spanning more than 49 institutions and multiple robotic platforms including the CMR Versius, Intuitive Surgical's da Vinci, da Vinci Research Kit (dVRK), Rob Surgical BiTrack, Virtual Incision's MIRA, Moon Surgical Maestro, and a variety of custom systems, spanning surgical manipulation, robotic ultrasound, and endoscopy procedures. We demonstrate the research enabled by this dataset through two foundation models. GR00T-H is the first open foundation vision-language-action model for medical robotics, which is the only evaluated model to achieve full end-to-end task completion on a structured suturing benchmark (25% of trials vs. 0% for all others) and achieves 64% average success across a 29-step ex vivo suturing sequence. We also train Cosmos-H-Surgical-Simulator, the first action-conditioned world model to enable multi-embodiment surgical simulation from a single checkpoint, spanning nine robotic platforms and supporting in silico policy evaluation and synthetic data generation for the medical domain. These results suggest that open, large-scale medical robot data collection can serve as critical infrastructure for the research community, enabling advances in robot learning, world modeling, and beyond.
CVMar 20, 2025
From Monocular Vision to Autonomous Action: Guiding Tumor Resection via 3D ReconstructionAyberk Acar, Mariana Smith, Lidia Al-Zogbi et al.
Surgical automation requires precise guidance and understanding of the scene. Current methods in the literature rely on bulky depth cameras to create maps of the anatomy, however this does not translate well to space-limited clinical applications. Monocular cameras are small and allow minimally invasive surgeries in tight spaces but additional processing is required to generate 3D scene understanding. We propose a 3D mapping pipeline that uses only RGB images to create segmented point clouds of the target anatomy. To ensure the most precise reconstruction, we compare different structure from motion algorithms' performance on mapping the central airway obstructions, and test the pipeline on a downstream task of tumor resection. In several metrics, including post-procedure tissue model evaluation, our pipeline performs comparably to RGB-D cameras and, in some cases, even surpasses their performance. These promising results demonstrate that automation guidance can be achieved in minimally invasive procedures with monocular cameras. This study is a step toward the complete autonomy of surgical robots.
RODec 2, 2020
Estimation of Trocar and Tool Interaction Forces on the da Vinci Research Kit with Two-Step Deep LearningJie Ying Wu, Nural Yilmaz, Peter Kazanzides et al.
Measurement of environment interaction forces during robotic minimally-invasive surgery would enable haptic feedback to the surgeon, thereby solving one long-standing limitation. Estimating this force from existing sensor data avoids the challenge of retrofitting systems with force sensors, but is difficult due to mechanical effects such as friction and compliance in the robot mechanism. We have previously shown that neural networks can be trained to estimate the internal robot joint torques, thereby enabling estimation of external forces. In this work, we extend the method to estimate external Cartesian forces and torques, and also present a two-step approach to adapt to the specific surgical setup by compensating for forces due to the interactions between the instrument shaft and cannula seal and between the trocar and patient body. Experiments show that this approach provides estimates of external forces and torques within a mean root-mean-square error (RMSE) of 2 N and 0.08 Nm, respectively. Furthermore, the two-step approach can add as little as 5 minutes to the surgery setup time, with about 4 minutes to collect intraoperative training data and 1 minute to train the second-step network.