Alan Kuntz

RO
h-index28
20papers
232citations
Novelty53%
AI Score55

20 Papers

25.5ROJun 3
Continuum Robot State Estimation with Actuation Uncertainty

James M. Ferguson, Alan Kuntz, Tucker Hermans

Continuum robots are flexible, slender manipulators well suited for confined surgical environments. In these settings, unknown interaction forces and model uncertainty significantly affect robot shape, motivating state estimation from external observations. Existing estimation methods either neglect actuation modeling or rely on simplified deterministic actuation models. In contrast, we jointly estimate robot shape, external loads, and actuation inputs using mechanically principled actuation priors. To achieve this, we present a discrete Cosserat rod formulation with piecewise-linear strain integration that provides high numerical accuracy while inducing a sparse factor graph structure for efficient nonlinear optimization. We extend the framework to tendon-driven and parallel robots in simulation and validate it experimentally on a surgical concentric tube robot. Overall, our approach enables principled real-time estimation across multiple robot architectures while providing direct access to manipulator Jacobians through the linearized factor graph.

ROSep 25, 2023
DefGoalNet: Contextual Goal Learning from Demonstrations For Deformable Object Manipulation

Bao Thach, Tanner Watts, Shing-Hei Ho et al. · nvidia

Shape servoing, a robotic task dedicated to controlling objects to desired goal shapes, is a promising approach to deformable object manipulation. An issue arises, however, with the reliance on the specification of a goal shape. This goal has been obtained either by a laborious domain knowledge engineering process or by manually manipulating the object into the desired shape and capturing the goal shape at that specific moment, both of which are impractical in various robotic applications. In this paper, we solve this problem by developing a novel neural network DefGoalNet, which learns deformable object goal shapes directly from a small number of human demonstrations. We demonstrate our method's effectiveness on various robotic tasks, both in simulation and on a physical robot. Notably, in the surgical retraction task, even when trained with as few as 10 demonstrations, our method achieves a median success percentage of nearly 90%. These results mark a substantial advancement in enabling shape servoing methods to bring deformable object manipulation closer to practical, real-world applications.

99.6ROApr 22
Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics

Open-H-Embodiment Consortium, Nigel Nelson, Juo-Tung Chen et al.

Autonomous medical robots hold promise to improve patient outcomes, reduce provider workload, democratize access to care, and enable superhuman precision. However, autonomous medical robotics has been limited by a fundamental data problem: existing medical robotic datasets are small, single-embodiment, and rarely shared openly, restricting the development of foundation models that the field needs to advance. We introduce Open-H-Embodiment, the largest open dataset of medical robotic video with synchronized kinematics to date, spanning more than 49 institutions and multiple robotic platforms including the CMR Versius, Intuitive Surgical's da Vinci, da Vinci Research Kit (dVRK), Rob Surgical BiTrack, Virtual Incision's MIRA, Moon Surgical Maestro, and a variety of custom systems, spanning surgical manipulation, robotic ultrasound, and endoscopy procedures. We demonstrate the research enabled by this dataset through two foundation models. GR00T-H is the first open foundation vision-language-action model for medical robotics, which is the only evaluated model to achieve full end-to-end task completion on a structured suturing benchmark (25% of trials vs. 0% for all others) and achieves 64% average success across a 29-step ex vivo suturing sequence. We also train Cosmos-H-Surgical-Simulator, the first action-conditioned world model to enable multi-embodiment surgical simulation from a single checkpoint, spanning nine robotic platforms and supporting in silico policy evaluation and synthetic data generation for the medical domain. These results suggest that open, large-scale medical robot data collection can serve as critical infrastructure for the research community, enabling advances in robot learning, world modeling, and beyond.

59.2ROMar 24
ProbeMDE: Uncertainty-Guided Active Proprioception for Monocular Depth Estimation in Surgical Robotics

Britton Jordan, Jordan Thompson, Jesse F. d'Almeida et al.

Monocular depth estimation (MDE) provides a useful tool for robotic perception, but its predictions are often uncertain and inaccurate in challenging environments such as surgical scenes where textureless surfaces, specular reflections, and occlusions are common. To address this, we propose ProbeMDE, a cost-aware active sensing framework that combines RGB images with sparse proprioceptive measurements for MDE. Our approach utilizes an ensemble of MDE models to predict dense depth maps conditioned on both RGB images and on a sparse set of known depth measurements obtained via proprioception, where the robot has touched the environment in a known configuration. We quantify predictive uncertainty via the ensemble's variance and measure the gradient of the uncertainty with respect to candidate measurement locations. To prevent mode collapse while selecting maximally informative locations to propriocept (touch), we leverage Stein Variational Gradient Descent (SVGD) over this gradient map. We validate our method in both simulated and physical experiments on central airway obstruction surgical phantoms. Our results demonstrate that our approach outperforms baseline methods across standard depth estimation metrics, achieving higher accuracy while minimizing the number of required proprioceptive measurements. Project page: https://brittonjordan.github.io/probe_mde/

ROJan 1, 2024Code
General-purpose foundation models for increased autonomy in robot-assisted surgery

Samuel Schmidgall, Ji Woong Kim, Alan Kuntz et al.

The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise toward being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: (1) there is a lack of existing large-scale open-source data to train models, (2) it is challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue, and (3) surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This perspective article aims to provide a path toward increasing robot autonomy in robot-assisted surgery through the development of a multi-modal, multi-task, vision-language-action model for surgical robots. Ultimately, we argue that surgical robots are uniquely positioned to benefit from general-purpose models and provide three guiding actions toward increased autonomy in robot-assisted surgery.

15.4ROMay 18
Neural Operators for Design-Space Surrogate Modeling of Tendon-Actuated Continuum Robots

Branden Frieden, James M. Ferguson, Alan Kuntz et al.

Continuum robots enable dexterous manipulation in constrained environments, but require accurate and efficient models for real-time manipulation and control. Traditional physics-based models can be computationally expensive and may suffer from inaccuracies due to unmodeled effects, while current learning-based methods often generalize poorly beyond the specific robot on which they are trained. We present a formulation of surrogate modeling for tendon-driven continuum robots as an operator learning problem that maps robot design parameters and tendon actuation inputs to resulting configurations. This formulation enables a single trained model to generalize across a large class of robot designs. We develop four novel neural operator architectures--two based on Deep Operator Networks (DeepONets) and two based on Fourier Neural Operators (FNOs)--and train them on simulation data to predict robot configurations. All architectures achieve good accuracy while allowing for fast and accurate generalization across designs. Our results demonstrate that operator learning provides an effective and generalizable surrogate for continuum robot mechanics in the design space, enabling fast modeling for control, planning, and design optimization in surgical and industrial applications.

27.1ROMar 24
PinPoint: Monocular Needle Pose Estimation for Robotic Suturing via Stein Variational Newton and Geometric Residuals

Jesse F. d'Almeida, Tanner Watts, Susheela Sharma Stern et al.

Reliable estimation of surgical needle 3D position and orientation is essential for autonomous robotic suturing, yet existing methods operate almost exclusively under stereoscopic vision. In monocular endoscopic settings, common in transendoscopic and intraluminal procedures, depth ambiguity and rotational symmetry render needle pose estimation inherently ill-posed, producing a multimodal distribution over feasible configurations, rather than a single, well-grounded estimate. We present PinPoint, a probabilistic variational inference framework that treats this ambiguity directly, maintaining a distribution of pose hypotheses rather than suppressing it. PinPoint combines monocular image observations with robot-grasp constraints through analytical geometric likelihoods with closed-form Jacobians. This framework enables efficient Gauss-Newton preconditioning in a Stein Variational Newton inference, where second-order particle transport deterministically moves particles toward high-probability regions while kernel-based repulsion preserves diversity in the multimodal structure. On real needle-tracking sequences, PinPoint reduces mean translational error by 80% (down to 1.00 mm) and rotational error by 78% (down to 13.80°) relative to a particle-filter baseline, with substantially better-calibrated uncertainty. On induced-rotation sequences, where monocular ambiguity is most severe, PinPoint maintains a bimodal posterior 84% of the time, almost three times the rate of the particle filter baseline, correctly preserving the alternative hypothesis rather than committing prematurely to one mode. Suturing experiments in ex vivo tissue demonstrate stable tracking through intermittent occlusion, with average errors during occlusion of 1.34 mm in translation and 19.18° in rotation, even when the needle is fully embedded.

ROApr 10, 2024
Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery

Zohre Karimi, Shing-Hei Ho, Bao Thach et al.

Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations. The method then learns a policy by optimizing the learned reward function using reinforcement learning (RL). We show that using a learned reward function to obtain a policy is more robust than pure imitation learning. We apply our approach on a physical surgical electrocautery task and demonstrate that our method can perform well even when the provided demonstrations are suboptimal and the observations are high-dimensional point clouds. Code and videos available here: https://sites.google.com/view/lfdinelectrocautery

CVMar 20, 2025
From Monocular Vision to Autonomous Action: Guiding Tumor Resection via 3D Reconstruction

Ayberk Acar, Mariana Smith, Lidia Al-Zogbi et al.

Surgical automation requires precise guidance and understanding of the scene. Current methods in the literature rely on bulky depth cameras to create maps of the anatomy, however this does not translate well to space-limited clinical applications. Monocular cameras are small and allow minimally invasive surgeries in tight spaces but additional processing is required to generate 3D scene understanding. We propose a 3D mapping pipeline that uses only RGB images to create segmented point clouds of the target anatomy. To ensure the most precise reconstruction, we compare different structure from motion algorithms' performance on mapping the central airway obstructions, and test the pipeline on a downstream task of tumor resection. In several metrics, including post-procedure tissue model evaluation, our pipeline performs comparably to RGB-D cameras and, in some cases, even surpasses their performance. These promising results demonstrate that automation guidance can be achieved in minimally invasive procedures with monocular cameras. This study is a step toward the complete autonomy of surgical robots.

ROMay 8, 2023
DeformerNet: Learning Bimanual Manipulation of 3D Deformable Objects

Bao Thach, Brian Y. Cho, Shing-Hei Ho et al.

Applications in fields ranging from home care to warehouse fulfillment to surgical assistance require robots to reliably manipulate the shape of 3D deformable objects. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the manipulated object and a point cloud of the goal shape to learn a low-dimensional representation of the object shape. This shape embedding enables the robot to learn a visual servo controller that computes the desired robot end-effector action to iteratively deform the object toward the target shape. We demonstrate both in simulation and on a physical robot that DeformerNet reliably generalizes to object shapes and material stiffness not seen during training, including ex vivo chicken muscle tissue. Crucially, using DeformerNet, the robot successfully accomplishes three surgical sub-tasks: retraction (moving tissue aside to access a site underneath it), tissue wrapping (a sub-task in procedures like aortic stent placements), and connecting two tubular pieces of tissue (a sub-task in anastomosis).

ROOct 15, 2021
Toward Learning Context-Dependent Tasks from Demonstration for Tendon-Driven Surgical Robots

Yixuan Huang, Michael Bentley, Tucker Hermans et al.

Tendon-driven robots, a type of continuum robot, have the potential to reduce the invasiveness of surgery by enabling access to difficult-to-reach anatomical targets. In the future, the automation of surgical tasks for these robots may help reduce surgeon strain in the face of a rapidly growing population. However, directly encoding surgical tasks and their associated context for these robots is infeasible. In this work we take steps toward a system that is able to learn to successfully perform context-dependent surgical tasks by learning directly from a set of expert demonstrations. We present three models trained on the demonstrations conditioned on a vector encoding the context of the demonstration. We then use these models to plan and execute motions for the tendon-driven robot similar to the demonstrations for novel context not seen in the training set. We demonstrate the efficacy of our method on three surgery-inspired tasks.

ROOct 12, 2021
Planning Sensing Sequences for Subsurface 3D Tumor Mapping

Brian Y. Cho, Tucker Hermans, Alan Kuntz

Surgical automation has the potential to enable increased precision and reduce the per-patient workload of overburdened human surgeons. An effective automation system must be able to sense and map subsurface anatomy, such as tumors, efficiently and accurately. In this work, we present a method that plans a sequence of sensing actions to map the 3D geometry of subsurface tumors. We leverage a sequential Bayesian Hilbert map to create a 3D probabilistic occupancy model that represents the likelihood that any given point in the anatomy is occupied by a tumor, conditioned on sensor readings. We iteratively update the map, utilizing Bayesian optimization to determine sensing poses that explore unsensed regions of anatomy and exploit the knowledge gained by previous sensing actions. We demonstrate our method's efficiency and accuracy in three anatomical scenarios including a liver tumor scenario generated from a real patient's CT scan. The results show that our proposed method significantly outperforms comparison methods in terms of efficiency while detecting subsurface tumors with high accuracy.

ROOct 10, 2021
Learning Visual Shape Control of Novel 3D Deformable Objects from Partial-View Point Clouds

Bao Thach, Brian Y. Cho, Alan Kuntz et al.

If robots could reliably manipulate the shape of 3D deformable objects, they could find applications in fields ranging from home care to warehouse fulfillment to surgical assistance. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the object being manipulated and a point cloud of the goal shape to learn a low-dimensional representation of the object shape. This shape embedding enables the robot to learn to define a visual servo controller that provides Cartesian pose changes to the robot end-effector causing the object to deform towards its target shape. Crucially, we demonstrate both in simulation and on a physical robot that DeformerNet reliably generalizes to object shapes and material stiffness not seen during training and outperforms comparison methods for both the generic shape control and the surgical task of retraction.

ROJul 16, 2021
DeformerNet: A Deep Learning Approach to 3D Deformable Object Manipulation

Bao Thach, Alan Kuntz, Tucker Hermans

In this paper, we propose a novel approach to 3D deformable object manipulation leveraging a deep neural network called DeformerNet. Controlling the shape of a 3D object requires an effective state representation that can capture the full 3D geometry of the object. Current methods work around this problem by defining a set of feature points on the object or only deforming the object in 2D image space, which does not truly address the 3D shape control problem. Instead, we explicitly use 3D point clouds as the state representation and apply Convolutional Neural Network on point clouds to learn the 3D features. These features are then mapped to the robot end-effector's position using a fully-connected neural network. Once trained in an end-to-end fashion, DeformerNet directly maps the current point cloud of a deformable object, as well as a target point cloud shape, to the desired displacement in robot gripper position. In addition, we investigate the problem of predicting the manipulation point location given the initial and goal shape of the object.

ROJan 13, 2021
A Recurrent Neural Network Approach to Roll Estimation for Needle Steering

Maxwell Emerson, James M. Ferguson, Tayfun Efe Ertop et al.

Steerable needles are a promising technology for delivering targeted therapies in the body in a minimally-invasive fashion, as they can curve around anatomical obstacles and hone in on anatomical targets. In order to accurately steer them, controllers must have full knowledge of the needle tip's orientation. However, current sensors either do not provide full orientation information or interfere with the needle's ability to deliver therapy. Further, torsional dynamics can vary and depend on many parameters making steerable needles difficult to accurately model, limiting the effectiveness of traditional observer methods. To overcome these limitations, we propose a model-free, learned-method that leverages LSTM neural networks to estimate the needle tip's orientation online. We validate our method by integrating it into a sliding-mode controller and steering the needle to targets in gelatin and ex vivo ovine brain tissue. We compare our method's performance against an Extended Kalman Filter, a model-based observer, achieving significantly lower targeting errors.

AIJan 8, 2021
Optimizing Hospital Room Layout to Reduce the Risk of Patient Falls

Sarvenaz Chaeibakhsh, Roya Sabbagh Novin, Tucker Hermans et al.

Despite years of research into patient falls in hospital rooms, falls and related injuries remain a serious concern to patient safety. In this work, we formulate a gradient-free constrained optimization problem to generate and reconfigure the hospital room interior layout to minimize the risk of falls. We define a cost function built on a hospital room fall model that takes into account the supportive or hazardous effect of the patient's surrounding objects, as well as simulated patient trajectories inside the room. We define a constraint set that ensures the functionality of the generated room layouts in addition to conforming to architectural guidelines. We solve this problem efficiently using a variant of simulated annealing. We present results for two real-world hospital room types and demonstrate a significant improvement of 18% on average in patient fall risk when compared with a traditional hospital room layout and 41% when compared with randomly generated layouts.

ROJan 6, 2021
Safer Motion Planning of Steerable Needles via a Shaft-to-Tissue Force Model

Michael Bentley, Caleb Rucker, Chakravarthy Reddy et al.

Steerable needles are capable of accurately targeting difficult-to-reach clinical sites in the body. By bending around sensitive anatomical structures, steerable needles have the potential to reduce the invasiveness of many medical procedures. However, inserting these needles with curved trajectories increases the risk of tissue damage due to perpendicular forces exerted on the surrounding tissue by the needle's shaft, potentially resulting in lateral shearing through tissue. Such forces can cause significant damage to surrounding tissue, negatively affecting patient outcomes. In this work, we derive a tissue and needle force model based on a Cosserat string formulation, which describes the normal forces and frictional forces along the shaft as a function of the planned needle path, friction model and parameters, and tip piercing force. We propose this new force model and associated cost function as a safer and more clinically relevant metric than those currently used in motion planning for steerable needles. We fit and validate our model through physical needle robot experiments in a gel phantom. We use this force model to define a bottleneck cost function for motion planning and evaluate it against the commonly used path-length cost function in hundreds of randomly generated 3-D environments. Plans generated with our force-based cost show a 62% reduction in the peak modeled tissue force with only a 0.07% increase in length on average compared to using the path-length cost in planning. Additionally, we demonstrate the ability to plan motions with our force-based cost function in a lung tumor biopsy scenario from a segmented computed tomography (CT) scan. By planning motions for the needle that aim to minimize the modeled needle-to-tissue force explicitly, our method plans needle paths that may reduce the risk of significant tissue damage while still reaching desired targets in the body.

ROJul 1, 2019
Toward Asymptotically-Optimal Inspection Planning via Efficient Near-Optimal Graph Search

Mengyu Fu, Alan Kuntz, Oren Salzman et al.

Inspection planning, the task of planning motions that allow a robot to inspect a set of points of interest, has applications in domains such as industrial, field, and medical robotics. Inspection planning can be computationally challenging, as the search space over motion plans that inspect the points of interest grows exponentially with the number of inspected points. We propose a novel method, Incremental Random Inspection-roadmap Search (IRIS), that computes inspection plans whose length and set of inspected points asymptotically converge to those of an optimal inspection plan. IRIS incrementally densifies a motion planning roadmap using sampling-based algorithms, and performs efficient near-optimal graph search over the resulting roadmap as it is generated. We demonstrate IRIS's efficacy on a simulated planar 5DOF manipulator inspection task and on a medical endoscopic inspection task for a continuum parallel surgical robot in anatomy segmented from patient CT data. We show that IRIS computes higher-quality inspection paths orders of magnitudes faster than a prior state-of-the-art method.

ROApr 29, 2019
Enabling Robots to Understand Incomplete Natural Language Instructions Using Commonsense Reasoning

Haonan Chen, Hao Tan, Alan Kuntz et al.

Enabling robots to understand instructions provided via spoken natural language would facilitate interaction between robots and people in a variety of settings in homes and workplaces. However, natural language instructions are often missing information that would be obvious to a human based on environmental context and common sense, and hence does not need to be explicitly stated. In this paper, we introduce Language-Model-based Commonsense Reasoning (LMCR), a new method which enables a robot to listen to a natural language instruction from a human, observe the environment around it, and automatically fill in information missing from the instruction using environmental context and a new commonsense reasoning approach. Our approach first converts an instruction provided as unconstrained natural language into a form that a robot can understand by parsing it into verb frames. Our approach then fills in missing information in the instruction by observing objects in its vicinity and leveraging commonsense reasoning. To learn commonsense reasoning automatically, our approach distills knowledge from large unstructured textual corpora by training a language model. Our results show the feasibility of a robot learning commonsense knowledge automatically from web-based textual corpora, and the power of learned commonsense reasoning models in enabling a robot to autonomously perform tasks based on incomplete natural language instructions.

ROJul 21, 2016
Interleaving Optimization with Sampling-Based Motion Planning (IOS-MP): Combining Local Optimization with Global Exploration

Alan Kuntz, Chris Bowen, Ron Alterovitz

Computing globally optimal motion plans for a robot is challenging in part because it requires analyzing a robot's configuration space simultaneously from both a macroscopic viewpoint (i.e., considering paths in multiple homotopic classes) and a microscopic viewpoint (i.e., locally optimizing path quality). We introduce Interleaved Optimization with Sampling-based Motion Planning (IOS-MP), a new method that effectively combines global exploration and local optimization to quickly compute high quality motion plans. Our approach combines two paradigms: (1) asymptotically-optimal sampling-based motion planning, which is effective at global exploration but relatively slow at locally refining paths, and (2) optimization-based motion planning, which locally optimizes paths quickly but lacks a global view of the configuration space. IOS-MP iteratively alternates between global exploration and local optimization, sharing information between the two, to improve motion planning efficiency. We evaluate IOS-MP as it scales with respect to dimensionality and complexity, as well as demonstrate its effectiveness on a 7-DOF manipulator for tasks specified using goal configurations and workspace goal regions.