Changhao Wang

RO
h-index34
16papers
216citations
Novelty57%
AI Score54

16 Papers

ROOct 1, 2022
Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning

Zheng Wu, Yichen Xie, Wenzhao Lian et al.

Humans are capable of abstracting various tasks as different combinations of multiple attributes. This perspective of compositionality is vital for human rapid learning and adaption since previous experiences from related tasks can be combined to generalize across novel compositional settings. In this work, we aim to achieve zero-shot policy generalization of Reinforcement Learning (RL) agents by leveraging the task compositionality. Our proposed method is a meta- RL algorithm with disentangled task representation, explicitly encoding different aspects of the tasks. Policy generalization is then performed by inferring unseen compositional task representations via the obtained disentanglement without extra exploration. The evaluation is conducted on three simulated tasks and a challenging real-world robotic insertion task. Experimental results demonstrate that our proposed method achieves policy generalization to unseen compositional tasks in a zero-shot manner.

RODec 9, 2025Code
OSMO: Open-Source Tactile Glove for Human-to-Robot Skill Transfer

Jessica Yin, Haozhi Qi, Youngsun Wi et al.

Human video demonstrations provide abundant training data for learning robot policies, but video alone cannot capture the rich contact signals critical for mastering manipulation. We introduce OSMO, an open-source wearable tactile glove designed for human-to-robot skill transfer. The glove features 12 three-axis tactile sensors across the fingertips and palm and is designed to be compatible with state-of-the-art hand-tracking methods for in-the-wild data collection. We demonstrate that a robot policy trained exclusively on human demonstrations collected with OSMO, without any real robot data, is capable of executing a challenging contact-rich manipulation task. By equipping both the human and the robot with the same glove, OSMO minimizes the visual and tactile embodiment gap, enabling the transfer of continuous shear and normal force feedback while avoiding the need for image inpainting or other vision-based force inference. On a real-world wiping task requiring sustained contact pressure, our tactile-aware policy achieves a 72% success rate, outperforming vision-only baselines by eliminating contact-related failure modes. We release complete hardware designs, firmware, and assembly instructions to support community adoption.

RONov 12, 2025
SPIDER: Scalable Physics-Informed Dexterous Retargeting

Chaoyi Pan, Changhao Wang, Haozhi Qi et al.

Learning dexterous and agile policy for humanoid and dexterous hand control requires large-scale demonstrations, but collecting robot-specific data is prohibitively expensive. In contrast, abundant human motion data is readily available from motion capture, videos, and virtual reality, which could help address the data scarcity problem. However, due to the embodiment gap and missing dynamic information like force and torque, these demonstrations cannot be directly executed on robots. To bridge this gap, we propose Scalable Physics-Informed DExterous Retargeting (SPIDER), a physics-based retargeting framework to transform and augment kinematic-only human demonstrations to dynamically feasible robot trajectories at scale. Our key insight is that human demonstrations should provide global task structure and objective, while large-scale physics-based sampling with curriculum-style virtual contact guidance should refine trajectories to ensure dynamical feasibility and correct contact sequences. SPIDER scales across diverse 9 humanoid/dexterous hand embodiments and 6 datasets, improving success rates by 18% compared to standard sampling, while being 10X faster than reinforcement learning (RL) baselines, and enabling the generation of a 2.4M frames dynamic-feasible robot dataset for policy learning. As a universal physics-based retargeting method, SPIDER can work with diverse quality data and generate diverse and high-quality data to enable efficient policy learning with methods like RL.

ROFeb 6, 2025
DexterityGen: Foundation Controller for Unprecedented Dexterity

Zhao-Heng Yin, Changhao Wang, Luis Pineda et al. · cmu, meta-ai

Teaching robots dexterous manipulation skills, such as tool use, presents a significant challenge. Current approaches can be broadly categorized into two strategies: human teleoperation (for imitation learning) and sim-to-real reinforcement learning. The first approach is difficult as it is hard for humans to produce safe and dexterous motions on a different embodiment without touch feedback. The second RL-based approach struggles with the domain gap and involves highly task-specific reward engineering on complex tasks. Our key insight is that RL is effective at learning low-level motion primitives, while humans excel at providing coarse motion commands for complex, long-horizon tasks. Therefore, the optimal solution might be a combination of both approaches. In this paper, we introduce DexterityGen (DexGen), which uses RL to pretrain large-scale dexterous motion primitives, such as in-hand rotation or translation. We then leverage this learned dataset to train a dexterous foundational controller. In the real world, we use human teleoperation as a prompt to the controller to produce highly dexterous behavior. We evaluate the effectiveness of DexGen in both simulation and real world, demonstrating that it is a general-purpose controller that can realize input dexterous manipulation commands and significantly improves stability by 10-100x measured as duration of holding objects across diverse tasks. Notably, with DexGen we demonstrate unprecedented dexterous skills including diverse object reorientation and dexterous tool use such as pen, syringe, and screwdriver for the first time.

LGFeb 3
UniGeM: Unifying Data Mixing and Selection via Geometric Exploration and Mining

Changhao Wang, Yunfei Yu, Xinhao Yao et al.

The scaling of Large Language Models (LLMs) is increasingly limited by data quality. Most methods handle data mixing and sample selection separately, which can break the structure in code corpora. We introduce \textbf{UniGeM}, a framework that unifies mixing and selection by treating data curation as a \textit{manifold approximation} problem without training proxy models or relying on external reference datasets. UniGeM operates hierarchically: \textbf{Macro-Exploration} learns mixing weights with stability-based clustering; \textbf{Micro-Mining} filters high-quality instances by their geometric distribution to ensure logical consistency. Validated by training 8B and 16B MoE models on 100B tokens, UniGeM achieves \textbf{2.0$\times$ data efficiency} over a random baseline and further improves overall performance compared to SOTA methods in reasoning-heavy evaluations and multilingual generalization.

ROMar 10, 2025
Geometric Retargeting: A Principled, Ultrafast Neural Hand Retargeting Algorithm

Zhao-Heng Yin, Changhao Wang, Luis Pineda et al.

We introduce Geometric Retargeting (GeoRT), an ultrafast, and principled neural hand retargeting algorithm for teleoperation, developed as part of our recent Dexterity Gen (DexGen) system. GeoRT converts human finger keypoints to robot hand keypoints at 1KHz, achieving state-of-the-art speed and accuracy with significantly fewer hyperparameters. This high-speed capability enables flexible postprocessing, such as leveraging a foundational controller for action correction like DexGen. GeoRT is trained in an unsupervised manner, eliminating the need for manual annotation of hand pairs. The core of GeoRT lies in novel geometric objective functions that capture the essence of retargeting: preserving motion fidelity, ensuring configuration space (C-space) coverage, maintaining uniform response through high flatness, pinch correspondence and preventing self-collisions. This approach is free from intensive test-time optimization, offering a more scalable and practical solution for real-time hand retargeting.

RONov 20, 2025
Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations

Irmak Guzey, Haozhi Qi, Julen Urain et al. · cmu, meta-ai

Learning multi-fingered robot policies from humans performing daily tasks in natural environments has long been a grand goal in the robotics community. Achieving this would mark significant progress toward generalizable robot manipulation in human environments, as it would reduce the reliance on labor-intensive robot data collection. Despite substantial efforts, progress toward this goal has been bottle-necked by the embodiment gap between humans and robots, as well as by difficulties in extracting relevant contextual and motion cues that enable learning of autonomous policies from in-the-wild human videos. We claim that with simple yet sufficiently powerful hardware for obtaining human data and our proposed framework AINA, we are now one significant step closer to achieving this dream. AINA enables learning multi-fingered policies from data collected by anyone, anywhere, and in any environment using Aria Gen 2 glasses. These glasses are lightweight and portable, feature a high-resolution RGB camera, provide accurate on-board 3D head and hand poses, and offer a wide stereo view that can be leveraged for depth estimation of the scene. This setup enables the learning of 3D point-based policies for multi-fingered hands that are robust to background changes and can be deployed directly without requiring any robot data (including online corrections, reinforcement learning, or simulation). We compare our framework against prior human-to-robot policy learning approaches, ablate our design choices, and demonstrate results across nine everyday manipulation tasks. Robot rollouts are best viewed on our website: https://aina-robot.github.io.

AIOct 18, 2025
RGMem: Renormalization Group-based Memory Evolution for Language Agent User Profile

Ao Tian, Yunfeng Lu, Xinxin Fan et al.

Personalized and continuous interactions are the key to enhancing user experience in today's large language model (LLM)-based conversational systems, however, the finite context windows and static parametric memory make it difficult to model the cross-session long-term user states and behavioral consistency. Currently, the existing solutions to this predicament, such as retrieval-augmented generation (RAG) and explicit memory systems, primarily focus on fact-level storage and retrieval, lacking the capability to distill latent preferences and deep traits from the multi-turn dialogues, which limits the long-term and effective user modeling, directly leading to the personalized interactions remaining shallow, and hindering the cross-session continuity. To realize the long-term memory and behavioral consistency for Language Agents in LLM era, we propose a self-evolving memory framework RGMem, inspired by the ideology of classic renormalization group (RG) in physics, this framework enables to organize the dialogue history in multiple scales: it first extracts semantics and user insights from episodic fragments, then through hierarchical coarse-graining and rescaling operations, progressively forms a dynamically-evolved user profile. The core innovation of our work lies in modeling memory evolution as a multi-scale process of information compression and emergence, which accomplishes the high-level and accurate user profiles from noisy and microscopic-level interactions.

AIOct 18, 2025
DTKG: Dual-Track Knowledge Graph-Verified Reasoning Framework for Multi-Hop QA

Changhao Wang, Yanfang Liu, Xinxin Fan et al.

Multi-hop reasoning for question answering (QA) plays a critical role in retrieval-augmented generation (RAG) for modern large language models (LLMs). The accurate answer can be obtained through retrieving relational structure of entities from knowledge graph (KG). Regarding the inherent relation-dependency and reasoning pattern, multi-hop reasoning can be in general classified into two categories: i) parallel fact-verification multi-hop reasoning question, i.e., requiring simultaneous verifications of multiple independent sub-questions; and ii) chained multi-hop reasoning questions, i.e., demanding sequential multi-step inference with intermediate conclusions serving as essential premises for subsequent reasoning. Currently, the multi-hop reasoning approaches singly employ one of two techniques: LLM response-based fact verification and KG path-based chain construction. Nevertheless, the former excels at parallel fact-verification but underperforms on chained reasoning tasks, while the latter demonstrates proficiency in chained multi-hop reasoning but suffers from redundant path retrieval when handling parallel fact-verification reasoning. These limitations deteriorate the efficiency and accuracy for multi-hop QA tasks. To address this challenge, we propose a novel dual-track KG verification and reasoning framework DTKG, which is inspired by the Dual Process Theory in cognitive science. Specifically, DTKG comprises two main stages: the Classification Stage and the Branch Processing Stage.

LGJun 27, 2025
Score-Based Model for Low-Rank Tensor Recovery

Zhengyun Cheng, Changhao Wang, Guanwen Zhang et al.

Low-rank tensor decompositions (TDs) provide an effective framework for multiway data analysis. Traditional TD methods rely on predefined structural assumptions, such as CP or Tucker decompositions. From a probabilistic perspective, these can be viewed as using Dirac delta distributions to model the relationships between shared factors and the low-rank tensor. However, such prior knowledge is rarely available in practical scenarios, particularly regarding the optimal rank structure and contraction rules. The optimization procedures based on fixed contraction rules are complex, and approximations made during these processes often lead to accuracy loss. To address this issue, we propose a score-based model that eliminates the need for predefined structural or distributional assumptions, enabling the learning of compatibility between tensors and shared factors. Specifically, a neural network is designed to learn the energy function, which is optimized via score matching to capture the gradient of the joint log-probability of tensor entries and shared factors. Our method allows for modeling structures and distributions beyond the Dirac delta assumption. Moreover, integrating the block coordinate descent (BCD) algorithm with the proposed smooth regularization enables the model to perform both tensor completion and denoising. Experimental results demonstrate significant performance improvements across various tensor types, including sparse and continuous-time tensors, as well as visual data.

CVJan 23, 2025
PromptMono: Cross Prompting Attention for Self-Supervised Monocular Depth Estimation in Challenging Environments

Changhao Wang, Guanwen Zhang, Zhengyun Cheng et al.

Considerable efforts have been made to improve monocular depth estimation under ideal conditions. However, in challenging environments, monocular depth estimation still faces difficulties. In this paper, we introduce visual prompt learning for predicting depth across different environments within a unified model, and present a self-supervised learning framework called PromptMono. It employs a set of learnable parameters as visual prompts to capture domain-specific knowledge. To integrate prompting information into image representations, a novel gated cross prompting attention (GCPA) module is proposed, which enhances the depth estimation in diverse conditions. We evaluate the proposed PromptMono on the Oxford Robotcar dataset and the nuScenes dataset. Experimental results demonstrate the superior performance of the proposed method.

RONov 2, 2021
Trajectory Splitting: A Distributed Formulation for Collision Avoiding Trajectory Optimization

Changhao Wang, Jeffrey Bingham, Masayoshi Tomizuka

Efficient trajectory optimization is essential for avoiding collisions in unstructured environments, but it remains challenging to have both speed and quality in the solutions. One reason is that second-order optimality requires calculating Hessian matrices that can grow with $O(N^2)$ with the number of waypoints. Decreasing the waypoints can quadratically decrease computation time. Unfortunately, fewer waypoints result in lower quality trajectories that may not avoid the collision. To have both, dense waypoints and reduced computation time, we took inspiration from recent studies on consensus optimization and propose a distributed formulation of collocated trajectory optimization. It breaks a long trajectory into several segments, where each segment becomes a subproblem of a few waypoints. These subproblems are solved classically, but in parallel, and the solutions are fused into a single trajectory with a consensus constraint that enforces continuity of the segments through a consensus update. With this scheme, the quadratic complexity is distributed to each segment and enables solving for higher-quality trajectories with denser waypoints. Furthermore, the proposed formulation is amenable to using any existing trajectory optimizer for solving the subproblems. We compare the performance of our implementation of trajectory splitting against leading motion planning algorithms and demonstrate the improved computational efficiency of our method.

RONov 1, 2021
Safe Online Gain Optimization for Variable Impedance Control

Changhao Wang, Zhian Kuang, Xiang Zhang et al.

Smooth behaviors are preferable for many contact-rich manipulation tasks. Impedance control arises as an effective way to regulate robot movements by mimicking a mass-spring-damping system. Consequently, the robot behavior can be determined by the impedance gains. However, tuning the impedance gains for different tasks is tricky, especially for unstructured environments. Moreover, online adapting the optimal gains to meet the time-varying performance index is even more challenging. In this paper, we present Safe Online Gain Optimization for Variable Impedance Control (Safe OnGO-VIC). By reformulating the dynamics of impedance control as a control-affine system, in which the impedance gains are the inputs, we provide a novel perspective to understand variable impedance control. Additionally, we innovatively formulate an optimization problem with online collected force information to obtain the optimal impedance gains in real-time. Safety constraints are also embedded in the proposed framework to avoid unwanted collisions. We experimentally validated the proposed algorithm on three manipulation tasks. Comparison results with a constant gain baseline and an adaptive control method prove that the proposed algorithm is effective and generalizable to different scenarios.

ROOct 25, 2021
Learning Insertion Primitives with Discrete-Continuous Hybrid Action Space for Robotic Assembly Tasks

Xiang Zhang, Shiyu Jin, Changhao Wang et al.

This paper introduces a discrete-continuous action space to learn insertion primitives for robotic assembly tasks. Primitive is a sequence of elementary actions with certain exit conditions, such as "pushing down the peg until contact". Since the primitive is an abstraction of robot control commands and encodes human prior knowledge, it reduces the exploration difficulty and yields better learning efficiency. In this paper, we learn robot assembly skills via primitives. Specifically, we formulate insertion primitives as parameterized actions: hybrid actions consisting of discrete primitive types and continuous primitive parameters. Compared with the previous work using a set of discretized parameters for each primitive, the agent in our method can freely choose primitive parameters from a continuous space, which is more flexible and efficient. To learn these insertion primitives, we propose Twin-Smoothed Multi-pass Deep Q-Network (TS-MP-DQN), an advanced version of MP-DQN with twin Q-network to reduce the Q-value over-estimation. Extensive experiments are conducted in the simulation and real world for validation. From experiment results, our approach achieves higher success rates than three baselines: MP-DQN with parameterized actions, primitives with discrete parameters, and continuous velocity control. Furthermore, learned primitives are robust to sim-to-real transfer and can generalize to challenging assembly tasks such as tight round peg-hole and complex shaped electric connectors with promising success rates. Experiment videos are available at https://msc.berkeley.edu/research/insertion-primitives.html.

ROMay 15, 2021
Development of Soft Tactile Sensor for Force Measurement and Position Detection

Wu-Te Yang, Zhian Kuang, Changhao Wang et al.

As more robots are implemented for contact-rich tasks, tactile sensors are in increasing demand. For many circumstances, the contact is required to be compliant, and soft sensors are in need. This paper introduces a novelly designed soft sensor that can simultaneously estimate the contact force and contact location. Inspired by humans' skin, which contains multi-layers of receptors, the designed tactile sensor has a dual-layer structure. The first layer is made of a conductive fabric that is responsible for sensing the contact force. The second layer is composed of four small conductive rubbers that can detect the contact location. Signals from the two layers are firstly processed by Wheatstone bridges and amplifier circuits so that the measurement noises are eliminated, and the sensitivity is improved. An Arduino chip is used for processing the signal and analyzing the data. The contact force can be obtained by a pre-trained model that maps from the voltage to force, and the contact location is estimated by the voltage signal from the conductive rubbers in the second layer. In addition, filtering methods are applied to eliminate the estimation noise. Finally, experiments are provided to show the accuracy and robustness of the sensor.

ROJan 29, 2021
Contact Pose Identification for Peg-in-Hole Assembly under Uncertainties

Shiyu Jin, Xinghao Zhu, Changhao Wang et al.

Peg-in-hole assembly is a challenging contact-rich manipulation task. There is no general solution to identify the relative position and orientation between the peg and the hole. In this paper, we propose a novel method to classify the contact poses based on a sequence of contact measurements. When the peg contacts the hole with pose uncertainties, a tilt-then-rotate strategy is applied, and the contacts are measured as a group of patterns to encode the contact pose. A convolutional neural network (CNN) is trained to classify the contact poses according to the patterns. In the end, an admittance controller guides the peg towards the error direction and finishes the peg-in-hole assembly. Simulations and experiments are provided to show that the proposed method can be applied to the peg-in-hole assembly of different geometries. We also demonstrate the ability to alleviate the sim-to-real gap.