Yuheng Zhi

h-index5

9papers

94citations

Novelty55%

AI Score42

Ranked #60,827 of 194,257 authors (top 31%)#1,767 in RO (top 26%)

9 Papers

14.1LGJul 10

Iris Xu, Sunshine Jiang, John Marangola et al. · berkeley

Reinforcement learning (RL) is increasingly used to post-train vision-language-action (VLA) models, but every update consumes robot rollouts that are slow and costly to collect, making sample efficiency a central concern. Manipulation tasks typically provide only sparse rewards, so a weak policy fails almost every rollout early in training and has little to learn from, even when those failures execute coherent behavior. Such a failure, however, is a success at a different task. We present Learning from Hindsight (LfH), which brings hindsight relabeling to RL post-training of VLAs by scoring failed rollouts against the tasks they actually achieved. A single vision-language model relabels both the instruction and the reward, proposing a hindsight instruction for a group of failed rollouts and scoring how well each satisfies it, and the policy trains on the relabeled and original rollouts jointly. Because VLAs generalize across language, relabeling in language lets the policy learn more from the same trajectories. On out-of-distribution LIBERO-PRO tasks, where standard RL improves only slowly, LfH achieves $5\times$ improvement in sample efficiency, and outperforms a dense progress-reward baseline. The gains hold across VLA backbones and on a physical Franka robot.

2.8CVMar 31, 2023

SemHint-MD: Learning from Noisy Semantic Labels for Self-Supervised Monocular Depth Estimation

Shan Lin, Yuheng Zhi, Michael C. Yip

Without ground truth supervision, self-supervised depth estimation can be trapped in a local minimum due to the gradient-locality issue of the photometric loss. In this paper, we present a framework to enhance depth by leveraging semantic segmentation to guide the network to jump out of the local minimum. Prior works have proposed to share encoders between these two tasks or explicitly align them based on priors like the consistency between edges in the depth and segmentation maps. Yet, these methods usually require ground truth or high-quality pseudo labels, which may not be easily accessible in real-world applications. In contrast, we investigate self-supervised depth estimation along with a segmentation branch that is supervised with noisy labels provided by models pre-trained with limited data. We extend parameter sharing from the encoder to the decoder and study the influence of different numbers of shared decoder parameters on model performance. Also, we propose to use cross-task information to refine current depth and segmentation predictions to generate pseudo-depth and semantic labels for training. The advantages of the proposed method are demonstrated through extensive experiments on the KITTI benchmark and a downstream task for endoscopic tissue deformation tracking.

2.2ROSep 24, 2024

SurgIRL: Towards Life-Long Learning for Surgical Automation by Incremental Reinforcement Learning

Yun-Jie Ho, Zih-Yun Chiu, Yuheng Zhi et al.

Surgical automation holds immense potential to improve the outcome and accessibility of surgery. Recent studies use reinforcement learning to learn policies that automate different surgical tasks. However, these policies are developed independently and are limited in their reusability when the task changes, making it more time-consuming when robots learn to solve multiple tasks. Inspired by how human surgeons build their expertise, we train surgical automation policies through Surgical Incremental Reinforcement Learning (SurgIRL). SurgIRL aims to (1) acquire new skills by referring to external policies (knowledge) and (2) accumulate and reuse these skills to solve multiple unseen tasks incrementally (incremental learning). Our SurgIRL framework includes three major components. We first define an expandable knowledge set containing heterogeneous policies that can be helpful for surgical tasks. Then, we propose Knowledge Inclusive Attention Network with mAximum Coverage Exploration (KIAN-ACE), which improves learning efficiency by maximizing the coverage of the knowledge set during the exploration process. Finally, we develop incremental learning pipelines based on KIAN-ACE to accumulate and reuse learned knowledge and solve multiple surgical tasks sequentially. Our simulation experiments show that KIAN-ACE efficiently learns to automate ten surgical tasks separately or incrementally. We also evaluate our learned policies on the da Vinci Research Kit (dVRK) and demonstrate successful sim-to-real transfers.

4.1ROSep 29, 2024

KineDepth: Utilizing Robot Kinematics for Online Metric Depth Estimation

Soofiyan Atar, Yuheng Zhi, Florian Richter et al.

Depth perception is essential for a robot's spatial and geometric understanding of its environment, with many tasks traditionally relying on hardware-based depth sensors like RGB-D or stereo cameras. However, these sensors face practical limitations, including issues with transparent and reflective objects, high costs, calibration complexity, spatial and energy constraints, and increased failure rates in compound systems. While monocular depth estimation methods offer a cost-effective and simpler alternative, their adoption in robotics is limited due to their output of relative rather than metric depth, which is crucial for robotics applications. In this paper, we propose a method that utilizes a single calibrated camera, enabling the robot to act as a "measuring stick" to convert relative depth estimates into metric depth in real-time as tasks are performed. Our approach employs an LSTM-based metric depth regressor, trained online and refined through probabilistic filtering, to accurately restore the metric depth across the monocular depth map, particularly in areas proximal to the robot's motion. Experiments with real robots demonstrate that our method significantly outperforms current state-of-the-art monocular metric depth estimation techniques, achieving a 22.1% reduction in depth error and a 52% increase in success rate for a downstream task.

13.0ROMar 11

SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning

Anlun Huang, Zhenyu Wu, Soofiyan Atar et al.

Stabilizing unsecured payloads against the inherent oscillations of dynamic bipedal locomotion remains a critical engineering bottleneck for humanoids in unstructured environments. To solve this, we introduce ReST-RL, a hierarchical reinforcement learning architecture that explicitly decouples locomotion from payload stabilization, evaluated via the SteadyTray benchmark. Rather than relying on monolithic end-to-end learning, our framework integrates a robust base locomotion policy with a dynamic residual module engineered to actively cancel gait-induced perturbations at the end-effector. This architectural separation ensures steady tray transport without degrading the underlying bipedal stability. In simulation, the residual design significantly outperforms end-to-end baselines in gait smoothness and orientation accuracy, achieving a 96.9% success rate in variable velocity tracking and 74.5% robustness against external force disturbances. Successfully deployed on the Unitree G1 humanoid hardware, this modular approach demonstrates highly reliable zero-shot sim-to-real generalization across various objects and external force disturbances.

6.9ROJan 12, 2022

Configuration Space Decomposition for Scalable Proxy Collision Checking in Robot Planning and Control

Mrinal Verghese, Nikhil Das, Yuheng Zhi et al.

Real-time robot motion planning in complex high-dimensional environments remains an open problem. Motion planning algorithms, and their underlying collision checkers, are crucial to any robot control stack. Collision checking takes up a large portion of the computational time in robot motion planning. Existing collision checkers make trade-offs between speed and accuracy and scale poorly to high-dimensional, complex environments. We present a novel space decomposition method using K-Means clustering in the Forward Kinematics space to accelerate proxy collision checking. We train individual configuration space models using Fastron, a kernel perceptron algorithm, on these decomposed subspaces, yielding compact yet highly accurate models that can be queried rapidly and scale better to more complex environments. We demonstrate this new method, called Decomposed Fast Perceptron (D-Fastron), on the 7-DOF Baxter robot producing on average 29x faster collision checks and up to 9.8x faster motion planning compared to state-of-the-art geometric collision checkers.

3.0ROApr 15, 2021

Data-driven Actuator Selection for Artificial Muscle-Powered Robots

Taylor West Henderson, Yuheng Zhi, Angela Liu et al.

Even though artificial muscles have gained popularity due to their compliant, flexible, and compact properties, there currently does not exist an easy way of making informed decisions on the appropriate actuation strategy when designing a muscle-powered robot; thus limiting the transition of such technologies into broader applications. What's more, when a new muscle actuation technology is developed, it is difficult to compare it against existing robot muscles. To accelerate the development of artificial muscle applications, we propose a data driven approach for robot muscle actuator selection using Support Vector Machines (SVM). This first-of-its-kind method gives users gives users insight into which actuators fit their specific needs and actuation performance criteria, making it possible for researchers and engineer with little to no prior knowledge of artificial muscles to focus on application design. It also provides a platform to benchmark existing, new, or yet-to-be-discovered artificial muscle technologies. We test our method on unseen existing robot muscle designs to prove its usability on real-world applications. We provide an open-access, web-searchable interface for easy access to our models that will additionally allow for continuous contribution of new actuator data from groups around the world to enhance and expand these models.

12.8ROFeb 15, 2021Code

DiffCo: Auto-Differentiable Proxy Collision Detection with Multi-class Labels for Safety-Aware Trajectory Optimization

Yuheng Zhi, Nikhil Das, Michael Yip

The objective of trajectory optimization algorithms is to achieve an optimal collision-free path between a start and goal state. In real-world scenarios where environments can be complex and non-homogeneous, a robot needs to be able to gauge whether a state will be in collision with various objects in order to meet some safety metrics. The collision detector should be computationally efficient and, ideally, analytically differentiable to facilitate stable and rapid gradient descent during optimization. However, methods today lack an elegant approach to detect collision differentiably, relying rather on numerical gradients that can be unstable. We present DiffCo, the first, fully auto-differentiable, non-parametric model for collision detection. Its non-parametric behavior allows one to compute collision boundaries on-the-fly and update them, requiring no pre-training and allowing it to update continuously in dynamic environments. It provides robust gradients for trajectory optimization via backpropagation and is often 10-100x faster to compute than its geometric counterparts. DiffCo also extends trivially to modeling different object collision classes for semantically informed trajectory optimization.

10.9ROSep 23, 2018

Augmented Reality Predictive Displays to Help Mitigate the Effects of Delayed Telesurgery

Florian Richter, Yifei Zhang, Yuheng Zhi et al.

Surgical robots offer the exciting potential for remote telesurgery, but advances are needed to make this technology efficient and accurate to ensure patient safety. Achieving these goals is hindered by the deleterious effects of latency between the remote operator and the bedside robot. Predictive displays have found success in overcoming these effects by giving the operator immediate visual feedback. However, previously developed predictive displays can not be directly applied to telesurgery due to the unique challenges in tracking the 3D geometry of the surgical environment. In this paper, we present the first predictive display for teleoperated surgical robots. The predicted display is stereoscopic, utilizes Augmented Reality (AR) to show the predicted motions alongside the complex tissue found in-situ within surgical environments, and overcomes the challenges in accurately tracking slave-tools in real-time. We call this a Stereoscopic AR Predictive Display (SARPD). To test the SARPD's performance, we conducted a user study with ten participants on the da Vinci\textregistered{} Surgical System. The results showed with statistical significance that using SARPD decreased time to complete task while having no effect on error rates when operating under delay.