ROMar 18
DexEXO: A Wearability-First Dexterous Exoskeleton for Operator-Agnostic Demonstration and LearningAlvin Zhu, Mingzhang Zhu, Beom Jun Kim et al.
Scaling dexterous robot learning is constrained by the difficulty of collecting high-quality demonstrations across diverse operators. Existing wearable interfaces often trade comfort and cross-user adaptability for kinematic fidelity, while embodiment mismatch between demonstration and deployment requires visual post-processing before policy training. We present DexEXO, a wearability-first hand exoskeleton that aligns visual appearance, contact geometry, and kinematics at the hardware level. DexEXO features a pose-tolerant thumb mechanism and a slider-based finger interface analytically modeled to support hand lengths from 140~mm to 217~mm, reducing operator-specific fitting and enabling scalable cross-operator data collection. A passive hand visually matches the deployed robot, allowing direct policy training from raw wrist-mounted RGB observations. User studies demonstrate improved comfort and usability compared to prior wearable systems. Using visually aligned observations alone, we train diffusion policies that achieve competitive performance while substantially simplifying the end-to-end pipeline. These results show that prioritizing wearability and hardware-level embodiment alignment reduces both human and algorithmic bottlenecks without sacrificing task performance. Project Page: https://dexexo-research.github.io/
ROMar 14, 2025
Fast and Robust Localization for Humanoid Soccer Robot via Iterative Landmark MatchingRuochen Hou, Mingzhang Zhu, Hyunwoo Nam et al.
Accurate robot localization is essential for effective operation. Monte Carlo Localization (MCL) is commonly used with known maps but is computationally expensive due to landmark matching for each particle. Humanoid robots face additional challenges, including sensor noise from locomotion vibrations and a limited field of view (FOV) due to camera placement. This paper proposes a fast and robust localization method via iterative landmark matching (ILM) for humanoid robots. The iterative matching process improves the accuracy of the landmark association so that it does not need MCL to match landmarks to particles. Pose estimation with the outlier removal process enhances its robustness to measurement noise and faulty detections. Furthermore, an additional filter can be utilized to fuse inertial data from the inertial measurement unit (IMU) and pose data from localization. We compared ILM with Iterative Closest Point (ICP), which shows that ILM method is more robust towards the error in the initial guess and easier to get a correct matching. We also compared ILM with the Augmented Monte Carlo Localization (aMCL), which shows that ILM method is much faster than aMCL and even more accurate. The proposed method's effectiveness is thoroughly evaluated through experiments and validated on the humanoid robot ARTEMIS during RoboCup 2024 adult-sized soccer competition.
ROMar 12
WHED: A Wearable Hand Exoskeleton for Natural, High-Quality Demonstration CollectionMingzhang Zhu, Alvin Zhu, Jose Victor S. H. Ramos et al.
Scalable learning of dexterous manipulation remains bottlenecked by the difficulty of collecting natural, high-fidelity human demonstrations of multi-finger hands due to occlusion, complex hand kinematics, and contact-rich interactions. We present WHED, a wearable hand-exoskeleton system designed for in-the-wild demonstration capture, guided by two principles: wearability-first operation for extended use and a pose-tolerant, free-to-move thumb coupling that preserves natural thumb behaviors while maintaining a consistent mapping to the target robot thumb degrees of freedom. WHED integrates a linkage-driven finger interface with passive fit accommodation, a modified passive hand with robust proprioceptive sensing, and an onboard sensing/power module. We also provide an end-to-end data pipeline that synchronizes joint encoders, AR-based end-effector pose, and wrist-mounted visual observations, and supports post-processing for time alignment and replay. We demonstrate feasibility on representative grasping and manipulation sequences spanning precision pinch and full-hand enclosure grasps, and show qualitative consistency between collected demonstrations and replayed executions.