RODec 3, 2025Code
Cross-embodied Co-design for Dexterous HandsKehlani Fay, Darin Anthony Djapri, Anya Zorin et al.
Dexterous manipulation is limited by both control and design, without consensus as to what makes manipulators best for performing dexterous tasks. This raises a fundamental challenge: how should we design and control robot manipulators that are optimized for dexterity? We present a co-design framework that learns task-specific hand morphology and complementary dexterous control policies. The framework supports 1) an expansive morphology search space including joint, finger, and palm generation, 2) scalable evaluation across the wide design space via morphology-conditioned cross-embodied control, and 3) real-world fabrication with accessible components. We evaluate the approach across multiple dexterous tasks, including in-hand rotation with simulation and real deployment. Our framework enables an end-to-end pipeline that can design, train, fabricate, and deploy a new robotic hand in under 24 hours. The full framework will be open-sourced and available on our website.
ROAug 21, 2024
ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous TeleoperationShiqi Yang, Minghuan Liu, Yuzhe Qin et al.
Learning from demonstrations has shown to be an effective approach to robotic manipulation, especially with the recently collected large-scale robot data with teleoperation systems. Building an efficient teleoperation system across diverse robot platforms has become more crucial than ever. However, there is a notable lack of cost-effective and user-friendly teleoperation systems for different end-effectors, e.g., anthropomorphic robot hands and grippers, that can operate across multiple platforms. To address this issue, we develop ACE, a cross-platform visual-exoskeleton system for low-cost dexterous teleoperation. Our system utilizes a hand-facing camera to capture 3D hand poses and an exoskeleton mounted on a portable base, enabling accurate real-time capture of both finger and wrist poses. Compared to previous systems, which often require hardware customization according to different robots, our single system can generalize to humanoid hands, arm-hands, arm-gripper, and quadruped-gripper systems with high-precision teleoperation. This enables imitation learning for complex manipulation tasks on diverse platforms.
73.2ROMay 21
TacO: Benchmarking Tactile Sensors for Object ManipulationAnya Zorin, Zilin Si, Myungsun Park et al.
Vision-based learning from demonstrations has achieved remarkable success in enabling robots to perform manipulation tasks and high-level semantic reasoning, yet it remains insufficient for complex, contact-rich manipulation. While there is broad agreement that tactile sensing improves manipulation, there is no empirical guidance on which tactile sensors are best suited for which manipulation tasks. In this paper, we provide a systematic, task-driven evaluation of tactile sensors for robot manipulation and propose a framework for selecting and evaluating sensors based on manipulation policy performance. Separate manipulation policies are trained for tactile sensors of four distinct modalities: visual, acoustic, magnetic, and resistive, across three tasks: pick-and-place with unknown mass, object reorientation, and plug insertion. For each task, an analysis of how sensor properties such as spatial resolution, shear sensing, and tactile representation, and the inherent material friction affect task performances is done. Rather than tactile sensing being universally beneficial in the same way, our results show that the usefulness of tactile information depends strongly on sensor modality, material properties, and the specific manipulation tasks. All of the tactile sensors, code, data, and hardware setup will be publicly available on the project website.
RODec 10, 2024
Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body ControlChenhao Lu, Xuxin Cheng, Jialong Li et al.
Humanoid robots require both robust lower-body locomotion and precise upper-body manipulation. While recent Reinforcement Learning (RL) approaches provide whole-body loco-manipulation policies, they lack precise manipulation with high DoF arms. In this paper, we propose decoupling upper-body control from locomotion, using inverse kinematics (IK) and motion retargeting for precise manipulation, while RL focuses on robust lower-body locomotion. We introduce PMP (Predictive Motion Priors), trained with Conditional Variational Autoencoder (CVAE) to effectively represent upper-body motions. The locomotion policy is trained conditioned on this upper-body motion representation, ensuring that the system remains robust with both manipulation and locomotion. We show that CVAE features are crucial for stability and robustness, and significantly outperforms RL-based whole-body control in precise manipulation. With precise upper-body motion and robust lower-body locomotion control, operators can remotely control the humanoid to walk around and explore different environments, while performing diverse manipulation tasks.
ROMar 17, 2025
Humanoid Policy ~ Human PolicyRi-Zhao Qiu, Shiqi Yang, Xuxin Cheng et al.
Training manipulation policies for humanoid robots with diverse data enhances their robustness and generalization across tasks and platforms. However, learning solely from robot demonstrations is labor-intensive, requiring expensive tele-operated data collection which is difficult to scale. This paper investigates a more scalable data source, egocentric human demonstrations, to serve as cross-embodiment training data for robot learning. We mitigate the embodiment gap between humanoids and humans from both the data and modeling perspectives. We collect an egocentric task-oriented dataset (PH2D) that is directly aligned with humanoid manipulation demonstrations. We then train a human-humanoid behavior policy, which we term Human Action Transformer (HAT). The state-action space of HAT is unified for both humans and humanoid robots and can be differentiably retargeted to robot actions. Co-trained with smaller-scale robot data, HAT directly models humanoid robots and humans as different embodiments without additional supervision. We show that human data improves both generalization and robustness of HAT with significantly better data collection efficiency. Code and data: https://human-as-robot.github.io/
97.9ROApr 23
Long-Horizon Manipulation via Trace-Conditioned VLA PlanningIsabella Liu, An-Chieh Cheng, Rui Yan et al.
Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Manip, a modular framework that scales short-horizon VLA execution to long-horizon instruction following via a dedicated task-management VLM. The manager is decoupled from the executor and is invoked in a receding-horizon manner: given the current observation, it predicts a progress-aware remaining plan that combines (i) a subtask sequence with an explicit done + remaining split as lightweight language memory, and (ii) a visual trace -- a compact 2D keypoint trajectory prompt specifying where to go and what to approach next. The executor VLA is adapted to condition on the rendered trace, thereby turning long-horizon decision-making into repeated local control by following the trace. Crucially, predicting the remaining plan at each step yields an implicit closed loop: failed steps persist in subsequent outputs, and traces update accordingly, enabling automatic continuation and replanning without hand-crafted recovery logic or brittle visual-history buffers. Extensive experiments spanning embodied planning, long-horizon reasoning, trajectory prediction, and end-to-end manipulation in simulation and on a real Franka robot demonstrate strong gains in long-horizon success, robustness, and out-of-distribution generalization. Project page: https://www.liuisabella.com/LoHoManip
ROSep 16, 2025
Object Pose Estimation through Dexterous TouchAmir-Hossein Shahidzadeh, Jiyue Zhu, Kezhou Chen et al.
Robust object pose estimation is essential for manipulation and interaction tasks in robotics, particularly in scenarios where visual data is limited or sensitive to lighting, occlusions, and appearances. Tactile sensors often offer limited and local contact information, making it challenging to reconstruct the pose from partial data. Our approach uses sensorimotor exploration to actively control a robot hand to interact with the object. We train with Reinforcement Learning (RL) to explore and collect tactile data. The collected 3D point clouds are used to iteratively refine the object's shape and pose. In our setup, one hand holds the object steady while the other performs active exploration. We show that our method can actively explore an object's surface to identify critical pose features without prior knowledge of the object's geometry. Supplementary material and more demonstrations will be provided at https://amirshahid.github.io/BimanualTactilePose .
ROFeb 27, 2022
Configuration Control for Physical Coupling of Heterogeneous Robot SwarmsSha Yi, Zeynep Temel, Katia Sycara
In this paper, we present a heterogeneous robot swarm system that can physically couple with each other to form functional structures and dynamically decouple to perform individual tasks. The connection between robots can be formed with a passive coupling mechanism, ensuring minimum energy consumption during coupling and decoupling behavior. The heterogeneity of the system enables the robots to perform structural enhancement configurations based on specific environmental requirements. We propose a connection-pair oriented configuration control algorithm to form different assemblies. We show experiments of up to nine robots performing the coupling, gap-crossing, and decoupling behaviors.
ROFeb 6, 2022
PuzzleBots: Physical Coupling of Robot SwarmsSha Yi, Zeynep Temel, Katia Sycara
Robot swarms have been shown to improve the ability of individual robots by inter-robot collaboration. In this paper, we present the PuzzleBots - a low-cost robotic swarm system where robots can physically couple with each other to form functional structures with minimum energy consumption while maintaining individual mobility to navigate within the environment. Each robot has knobs and holes along the sides of its body so that the robots can couple by inserting the knobs into the holes. We present the characterization of knob design and the result of gap-crossing behavior with up to nine robots. We show with hardware experiments that the robots are able to couple with each other to cross gaps and decouple to perform individual tasks. We anticipate the PuzzleBots will be useful in unstructured environments as individuals and coupled systems in real-world applications.
ROOct 3, 2019
Behavior Mixing with Minimum Global and Subgroup Connectivity Maintenance for Large-Scale Multi-Robot SystemsWenhao Luo, Sha Yi, Katia Sycara
In many cases the multi-robot systems are desired to execute simultaneously multiple behaviors with different controllers, and sequences of behaviors in real time, which we call \textit{behavior mixing}. Behavior mixing is accomplished when different subgroups of the overall robot team change their controllers to collectively achieve given tasks while maintaining connectivity within and across subgroups in one connected communication graph. In this paper, we present a provably minimum connectivity maintenance framework to ensure the subgroups and overall robot team stay connected at all times while providing the highest freedom for behavior mixing. In particular, we propose a real-time distributed Minimum Connectivity Constraint Spanning Tree (MCCST) algorithm to select the minimum inter-robot connectivity constraints preserving subgroup and global connectivity that are \textit{least likely to be violated} by the original controllers. With the employed safety and connectivity barrier certificates for the activated connectivity constraints and collision avoidance, the behavior mixing controllers are thus minimally modified from the original controllers. We demonstrate the effectiveness and scalability of our approach via simulations of up to 100 robots with multiple behaviors.