Zonghe Chua

RO
10papers
103citations
Novelty38%
AI Score48

10 Papers

16.3ROMay 29
Shaft-integrated Force Sensing with Transformer-based Dynamics Compensation for Telesurgery

Shuyuan Yang, Grant Boone, Timo Markert et al.

Robot-Assisted Minimally Invasive Surgery (RAMIS) enhances surgeon dexterity, with newer platforms leveraging haptic feedback to further improve performance. Such force information has broader potential to inform performance assessment, tactile localization, and surgical autonomy. This motivates the need for accessible approaches to integrating force sensing into RAMIS tools. This work presents a method for integrating a six-axis commercial force sensor into the distal end of a standard cable-driven surgical instrument, enabling end-effector force measurement while preserving the original mechanical functionality of the device. The proposed design emphasizes reproducibility and accessibility for research applications, requiring no specialized manufacturing tools. A transformer neural network integrates force sensor measurements with robot state information to aid estimation of applied forces at the end-effector, compensating for internal cable forces arising from actuation. Our proposed approach achieved normalized errors below 6%, and generalized to unseen conditions better than purely proximal data-driven sensing approaches. High internal cable forces caused sensor saturation and reduced axial force observability, which can degrade performance along the tool's major axis and under higher load conditions. Given current levels of performance, the balance of system integrability and performance enables applications and research into timely topics of haptic feedback, skill assessment, and force-informed autonomy in RAMIS. Videos and code are available at https://enhanced-telerobotics.github.io/shaft force sensing.

4.2ROMay 19
Data-centric Design of Learning-based Surgical Gaze Perception Models in Multi-Task Simulation

Yizhou Li, Shuyuan Yang, Jiaji Su et al.

In robot-assisted minimally invasive surgery (RMIS), reduced haptic feedback and depth cues increase reliance on expert visual perception, motivating gaze-guided training and learning-based surgical perception models. However, operative expert gaze is costly to collect, and it remains unclear how the source of gaze supervision, both expertise level (intermediate vs. novice) and perceptual modality (active execution vs. passive viewing), shapes what attention models learn. We introduce a paired active-passive, multi-task surgical gaze dataset collected on the da Vinci SimNow simulator across four drills. Active gaze was recorded during task execution using a VR headset with eye tracking, and the corresponding videos were reused as stimuli to collect passive gaze from observers, enabling controlled same-video comparisons. We quantify skill- and modality-dependent differences in gaze organization and evaluate the substitutability of passive gaze for operative supervision using fixation density overlap analyses and single-frame saliency modeling. Across settings, MSI-Net produced stable, interpretable predictions, whereas SalGAN was unstable and often poorly aligned with human fixations. Models trained on passive gaze recovered a substantial portion of intermediate active attention, but with predictable degradation, and transfer was asymmetric between active and passive targets. Notably, novice passive labels approximated intermediate-passive targets with limited loss on higher-quality demonstrations, suggesting a practical path for scalable, crowd-sourced gaze supervision in surgical coaching and perception modeling.

18.1ROMay 1
A Model-based Visual Contact Localization and Force Sensing System for Compliant Robotic Grippers

Kaiwen Zuo, Shuyuan Yang, Zonghe Chua

Grasp force estimation can help prevent robots from damaging delicate objects during manipulation and improve learning-based robotic control. Integrating force sensing into deformable grippers negotiates trade-offs in cost, complexity, mechanical robustness, and performance. With the growing integration of RGB-D wrist cameras into robotic systems for control purposes, camera-based techniques are a promising solution for indirect visual force estimation. Current approaches mostly utilize end-to-end deep learning, which can be brittle when generalizing to new scenarios, while existing model-based approaches are unsuited to grasping and modern grasper geometries. To address these challenges, we developed a model-based visual force sensing approach integrating an iterative contact localization with generalization to unseen objects. The system extracts structural key points from wrist camera RGB-D images of deforming fin-ray-shaped soft grippers, and uses these key points to define parameters of an inverse finite element analysis simulation in Simulation Open Framework Architecture. The iterative contact localization sub-system utilizes a deep learning-based online 3D reconstruction and pose estimation pipeline to dynamically update contact location, and is robust to visual occlusion and unseen objects. Our system demonstrated an average root mean square error of 0.23 N and normalized root mean square deviation of 2.11% during the load phase, and 0.48 N and 4.34% over the entire grasping process when interacting with different objects under various conditions, showcasing its potential for real-time model-based indirect force sensing of soft grippers.

54.8ROMar 16
Real-time Capable Learning-based Visual Tool Pose Correction via Differentiable Simulation

Shuyuan Yang, Zonghe Chua

Autonomy in robot-assisted minimally invasive surgery has the potential to reduce surgeon cognitive and task load, thereby increasing procedural efficiency. However, implementing accurate autonomous control can be difficult due to poor end-effector proprioception. Joint encoder readings are typically inaccurate due to kinematic non-idealities in their cable-driven transmissions. Vision-based pose estimation approaches are highly effective, but lack real-time capability, generalizability, or can be hard to train. In this work, we demonstrate a real-time capable, Vision Transformer-based pose estimation approach that is trained using end-to-end differentiable kinematics and rendering. We demonstrate the potential of this approach to correct for noisy pose estimates through a real robot dataset and the potential real-time processing ability. Our approach is able to reduce more than 50% of hand-eye translation errors in the dataset, reaching the same performance level as an existing optimization-based method. Our approach is four times faster, and capable of near real-time inference at 22 Hz. A zero-shot prediction on an unseen dataset shows good generalization ability, and can be further finetuned for increased performance without human labeling.

ROSep 24, 2021
A 4-DoF Parallel Origami Haptic Device for Normal, Shear, and Torsion Feedback

Sophia R. Williams, Jacob M. Suchoski, Zonghe Chua et al.

We present a mesoscale finger-mounted 4-degree-of-freedom (DoF) haptic device that is created using origami fabrication techniques. The 4-DoF device is a parallel kinematic mechanism capable of delivering normal, shear, and torsional haptic feedback to the fingertip. Traditional methods of robot fabrication are not well suited for designing small robotic devices because it is challenging and expensive to manufacture small, low-friction joints. Our device uses origami manufacturing principles to reduce complexity and the device footprint. We characterize the bandwidth, workspace, and force output of the device. The capabilities of the torsion-DoF are demonstrated in a virtual reality scenario. Our results show that the device can deliver haptic feedback in 4-DoFs with an effective operational workspace of 0.64cm$^3$ with $\pm 30 ^ \circ$ rotation at every location. The maximum forces and torques the device can apply in the x-, y-, z-, and $θ$-directions, are $\pm$1.5N, $\pm$1.5N, 2N, and 5N$\cdot$mm, respectively, and the device has an operating bandwidth of 9Hz.

ROSep 23, 2021
Robot-Assisted Surgical Training Over Several Days in a Virtual Surgical Environment with Divergent and Convergent Force Fields

Yousi A. Oquendo, Zonghe Chua, Margaret M. Coad et al.

Surgical procedures require a high level of technical skill to ensure efficiency and patient safety. Due to the direct effect of surgeon skill on patient outcomes, the development of cost-effective and realistic training methods is imperative to accelerate skill acquisition. Teleoperated robotic devices allow for intuitive ergonomic control, but the learning curve for these systems remains steep. Recent studies in motor learning have shown that visual or physical exaggeration of errors helps trainees to learn to perform tasks faster and more accurately. In this study, we extended the work from two previous studies to investigate the performance of subjects in different force field training conditions, including convergent (assistive), divergent (resistive), and no force field (null).

ROSep 23, 2021
Characterization of Real-time Haptic Feedback from Multimodal Neural Network-based Force Estimates during Teleoperation

Zonghe Chua, Allison M. Okamura

Force estimation using neural networks is a promising approach to enable haptic feedback in minimally invasive surgical robots without end-effector force sensors. Various network architectures have been proposed, but none have been tested in real time with surgical-like manipulations. Thus, questions remain about the real-time transparency and stability of force feedback from neural network-based force estimates. We characterize the real-time impedance transparency and stability of force feedback rendered on a da Vinci Research Kit teleoperated surgical robot using neural networks with vision-only, state-only, and state and vision inputs. Networks were trained on an existing dataset of teleoperated manipulations without force feedback. To measure real-time stability and transparency during teleoperation with force feedback to the operator, we modeled a one-degree-of-freedom human and surgeon-side manipulandum that moved the patient-side robot to perform manipulations on silicone artificial tissue over various robot and camera configurations, and tools. We found that the networks using state inputs displayed more transparent impedance than a vision-only network. However, state-based networks displayed large instability when used to provide force feedback during lateral manipulation of the silicone. In contrast, the vision-only network showed consistent stability in all the evaluated directions. We confirmed the performance of the vision-only network for real-time force feedback in a demonstration with a human teleoperator.

RONov 4, 2020
Toward Force Estimation in Robot-Assisted Surgery using Deep Learning with Vision and Robot State

Zonghe Chua, Anthony M. Jarc, Allison M. Okamura

Knowledge of interaction forces during teleoperated robot-assisted surgery could be used to enable force feedback to human operators and evaluate tissue handling skill. However, direct force sensing at the end-effector is challenging because it requires biocompatible, sterilizable, and cost-effective sensors. Vision-based deep learning using convolutional neural networks is a promising approach for providing useful force estimates, though questions remain about generalization to new scenarios and real-time inference. We present a force estimation neural network that uses RGB images and robot state as inputs. Using a self-collected dataset, we compared the network to variants that included only a single input type, and evaluated how they generalized to new viewpoints, workspace positions, materials, and tools. We found that vision-based networks were sensitive to shifts in viewpoints, while state-only networks were robust to changes in workspace. The network with both state and vision inputs had the highest accuracy for an unseen tool, and was moderately robust to changes in viewpoints. Through feature removal studies, we found that using only position features produced better accuracy than using only force features as input. The network with both state and vision inputs outperformed a physics-based baseline model in accuracy. It showed comparable accuracy but faster computation times than a baseline recurrent neural network, making it better suited for real-time applications.

ROMay 23, 2020
Evaluation of Non-Collocated Force Feedback Driven by Signal-Independent Noise

Zonghe Chua, Allison M. Okamura, Darrel R. Deo

Individuals living with paralysis or amputation can operate robotic prostheses using input signals based on their intent or attempt to move. Because sensory function is lost or diminished in these individuals, haptic feedback must be non-collocated. The intracortical brain computer interface (iBCI) has enabled a variety of neural prostheses for people with paralysis. An important attribute of the iBCI is that its input signal contains signal-independent noise. To understand the effects of signal-independent noise on a system with non-collocated haptic feedback and inform iBCI-based prostheses control strategies, we conducted an experiment with a conventional haptic interface as a proxy for the iBCI. Able-bodied users were tasked with locating an indentation within a virtual environment using input from their right hand. Non-collocated haptic feedback of the interaction forces in the virtual environment was augmented with noise of three different magnitudes and simultaneously rendered on users' left hands. We found increases in distance error of the guess of the indentation location, mean time per trial, mean peak absolute displacement and speed of tool movements during localization for the highest noise level compared to the other two levels. The findings suggest that users have a threshold of disturbance rejection and that they attempt to increase their signal-to-noise ratio through their exploratory actions.

ROApr 28, 2020
Task Dynamics of Prior Training Influence Visual Force Estimation Ability During Teleoperation

Zonghe Chua, Anthony M. Jarc, Sherry Wren et al.

The lack of haptic feedback in Robot-assisted Minimally Invasive Surgery (RMIS) is a potential barrier to safe tissue handling during surgery. Bayesian modeling theory suggests that surgeons with experience in open or laparoscopic surgery can develop priors of tissue stiffness that translate to better force estimation abilities during RMIS compared to surgeons with no experience. To test if prior haptic experience leads to improved force estimation ability in teleoperation, 33 participants were assigned to one of three training conditions: manual manipulation, teleoperation with force feedback, or teleoperation without force feedback, and learned to tension a silicone sample to a set of force values. They were then asked to perform the tension task, and a previously unencountered palpation task, to a different set of force values under teleoperation without force feedback. Compared to the teleoperation groups, the manual group had higher force error in the tension task outside the range of forces they had trained on, but showed better speed-accuracy functions in the palpation task at low force levels. This suggests that the dynamics of the training modality affect force estimation ability during teleoperation, with the prior haptic experience accessible if formed under the same dynamics as the task.