ROMay 20
Mobile UMI: Cross-View Diffusion Policy with Decoupled Kinematics for Mobile ManipulationHaoran Huang, Haonan Dong, Huixu Dong
Mobile imitation learning on portable demonstration interfaces faces two coupled bottlenecks: locomotion-contaminated action labels and inference-induced execution latency on a continuously moving base. Recent wrist-mounted interfaces lower the cost of tabletop data collection, yet a single wrist view does not capture the global context required for base navigation. Adding a body-mounted camera entangles human walking with hand motion. Meanwhile, generative policies introduce hundreds of milliseconds of inference latency, during which the base advances past predicted waypoints, forcing backward corrections at action splices. This paper presents Mobile UMI, a hardware-free demonstration framework that addresses both gaps through three components. First, a dual-camera capture system records chest-centric global context and wrist-centric local interaction without any robot present. Second, a one-shot ChArUco-based spatial anchor unifies the chest and hand visual-inertial frames; the hand pose is then re-expressed relative to the chest to extract decoupled SE(3) manipulation and SE(2) base trajectories. Third, an asynchronous receding-horizon executor performs online state matching: each generated action chunk is realigned with the current physical pose so that expired waypoints are discarded before execution. The full system is evaluated on four long-horizon household tasks, achieving an average success rate of 83.8% over 100 trials per task. Controlled comparisons against ACT and Diffusion Policy show that the chest-relative label alone closes much of the gap; online state matching closes the remainder. These results indicate that, for mobile imitation learning under the tested conditions, explicit kinematic factorization combined with state-level latency alignment provides an effective solution without requiring architectural changes to the underlying policy class.
ROMay 17
AffordVLA: Injecting Affordance Representations into Vision-Language-Action Models via Implicit Feature AlignmentWeijie Kong, Zhian Su, Wei Yu et al.
Recent advances in Vision-Language-Action (VLA) models have shown strong potential for general-purpose robotic manipulation. However, the visual representations of most VLA models are often dominated by global object appearance and struggle to focus on task-relevant functional interaction regions, which limits their robustness in unstructured environments. Existing affordance-based methods typically rely on explicit mask injection or external perception modules, requiring additional annotations while introducing cascading perception errors and inference overhead. To address these limitations, we propose AffordVLA, an affordance-enhanced VLA framework that internalizes manipulation-centric affordance perception into VLA visual representations through implicit representation alignment. Specifically, we construct a zero-shot affordance teacher to extract task-conditioned affordance visual representations from RGB observations and language instructions. AffordVLA aligns the intermediate visual representations of the VLA with the affordance visual representations extracted by the teacher, thereby implicitly injecting manipulation-centric affordance perception into VLA visual representations and improving action accuracy. Extensive simulation and real-world experiments demonstrate that AffordVLA and its affordance teacher achieve state-of-the-art performance and outperform strong baselines. Ablation analyses show that AffordVLA effectively reshapes VLA visual representations while preserving inference efficiency, leading to improved manipulation success rates and training efficiency.
ROApr 7Code
Delta6: A Low-Cost, 6-DOF Force-Sensing Flexible End-EffectorYue Feng, Weicheng Huang, Chen Qiu et al.
This paper presents Delta6, a low-cost, six-degree-of-freedom (6-DOF) force/torque end-effector that combines antagonistic springs with magnetic encoders to deliver accurate wrench sensing while remaining as simple to assemble as flat-pack furniture. A fully 3D-printed prototype, assembled entirely from off-the-shelf parts, withstands peak forces above +/-14.4 N and torques of +/-0.33 N.m per axis; these limits can be further extended by leveraging the proposed parametric analytical model. Without calibration, Delta6 attains a 99th-percentile error of 7% full scale (FS). With lightweight sequence models, the error is reduced to 3.8% FS by the best-performing network. Benchmarks on multiple computing platforms confirm that the device's bandwidth is adjustable, enabling balanced trade-offs among update rate, accuracy, and cost, while durability, thermal drift, and zero-calibration tests confirm its robustness. With Delta6 mounted on a robot arm governed by a force-impedance controller, the system successfully performs two contact-rich tasks: buffing curved surfaces and tight assemblies. Experiments validate the design, showing that Delta6 is a robust, low-cost alternative to existing 6-DOF force sensing solutions. Open-source site: https://wings-robotics.github.io/delta6 .
ROMay 13
SID: Sliding into Distribution for Robust Few-Demonstration ManipulationYicheng Ma, Wei Yu, Zhian Su et al.
Generalizing robotic manipulation across object poses, viewpoints, and dynamic disturbances is difficult, especially with only a few demonstrations. End-to-end visuomotor policies are expressive but data-hungry, while planning and optimization satisfy explicit constraints but do not directly capture the interaction strategies demonstrated by humans. We propose Sliding into Distribution (SID), a structured framework that learns an object-centric motion field from canonicalized demonstrations to iteratively slide the system toward the demonstrated manifold and into the reliable operating region of a lightweight egocentric execution policy, mitigating out-of-distribution (OOD) execution. The motion field provides large corrective motions when far from the demonstration manifold and naturally vanishes near convergence, enabling robust reaching under substantial pose and viewpoint shifts. Within the reached regime, an egocentric policy trained with conditioned flow matching performs task-specific manipulation, supported by kinematically consistent point-cloud reprojection augmentation that preserves action-observation consistency. Across six real-world tasks, SID achieves approximately 90% success under OOD initializations with only two demonstrations, with under a 10% drop under distractors and external disturbances. Overall, SID provides a new paradigm for few-shot manipulation: explicitly managing distribution shift via online distribution recovery.
RONov 7, 2021
GSG: A Granary Soft Gripper with Mechanical Force Sensing via 3-Dimensional Snap-Through StructureHuixu Dong, Chao-Yu Chen, Chen Qiu et al.
Grasping is an essential capability for most robots in practical applications. Soft robotic grippers are considered as a critical part of robotic grasping and have attracted considerable attention in terms of the advantages of the high compliance and robustness to variance in object geometry; however, they are still limited by the corresponding sensing capabilities and actuation mechanisms. We propose a novel soft gripper that looks like a granary with a compliant snap-through bistable mechanism fabricated by integrated mold technology, achieving sensing and actuation purely mechanically. In particular, the snap-through bistable structure in the proposed gripper allows us to reduce the complexity of the mechanism, control, sensing designs since the grasping and sensing behaviors are completely passive. The grasping behaviors are automatically motivated once the trigger position of the gripper touches an object and applies sufficient force. To grasp objects with various profiles, the proposed granary soft gripper (GSG) is designed to be capable of enveloping, pinching and caging grasps. The gripper consists of a chamber palm, a palm cap and three fingers. First, the design of the gripper is analyzed. Then, after the theoretical model is constructed, finite element (FE) simulations are conducted to verify the built model. Finally, a series of grasping experiments is carried out to assess the snap-through behavior of the proposed gripper on grasping and sensing. The experimental results illustrate that the proposed gripper can manipulate a variety of soft and rigid objects and remain stable even though it undertakes external disturbances.
ROJun 28, 2021
Real-time Human-Robot Collaborative Manipulations of Cylindrical and Cubic Objects via Geometric Primitives and Depth InformationHuixu Dong, Jiadong Zhou, Haoyong Yu
Many objects commonly found in household and industrial environments are represented by cylindrical and cubic shapes. Thus, it is available for robots to manipulate them through the real-time detection of elliptic and rectangle shape primitives formed by the circular and rectangle tops of these objects. We devise a robust grasping system that enables a robot to manipulate cylindrical and cubic objects in collaboration scenarios by the proposed perception strategy including the detection of elliptic and rectangle shape primitives and depth information. The proposed method of detecting ellipses and rectangles incorporates a one-stage detection backbone and then, accommodates the proposed adaptive multi-branch multi-scale net with a designed iterative feature pyramid network, local inception net, and multi-receptive-filed feature fusion net to generate object detection recommendations. In terms of manipulating objects with different shapes, we propose the grasp synthetic to align the grasp pose of the gripper with an object's pose based on the proposed detector and registered depth information. The proposed robotic perception algorithm has been integrated on a robot to demonstrate the ability to carry out human-robot collaborative manipulations of cylindrical and cubic objects in real-time. We show that the robotic manipulator, empowered by the proposed detector, performs well in practical manipulation scenarios.(An experiment video is available in YouTube, https://www.youtube.com/watch?v=Amcs8lwvNK8.)
ROOct 19, 2018
Enabling Grasp Action: Generalized Evaluation of Grasp Stability via Contact Stiffness from Contact Mechanics InsightHuixu Dong, Chen Qiu, Dilip K. Prasad et al.
Performing a grasp is a pivotal capability for a robotic gripper. We propose a new evaluation approach of grasping stability via constructing a model of grasping stiffness based on the theory of contact mechanics. First, the mathematical models are built to explore soft contact and the general grasp stiffness between a finger and an object. Next, the grasping stiffness matrix is constructed to reflect the normal, tangential and torsion stiffness coefficients. Finally, we design two grasping cases to verify the proposed measurement criterion of grasping stability by comparing different grasping configurations. Specifically, a standard grasping index is used and compared with the minimum eigenvalue index of the constructed grasping stiffness we built. The comparison result reveals a similar tendency between them for measuring the grasping stability and thus, validates the proposed approach.
CVSep 12, 2018
Are object detection assessment criteria ready for maritime computer vision?Dilip K. Prasad, Huixu Dong, Deepu Rajan et al.
Maritime vessels equipped with visible and infrared cameras can complement other conventional sensors for object detection. However, application of computer vision techniques in maritime domain received attention only recently. The maritime environment offers its own unique requirements and challenges. Assessment of the quality of detections is a fundamental need in computer vision. However, the conventional assessment metrics suitable for usual object detection are deficient in the maritime setting. Thus, a large body of related work in computer vision appears inapplicable to the maritime setting at the first sight. We discuss the problem of defining assessment metrics suitable for maritime computer vision. We consider new bottom edge proximity metrics as assessment metrics for maritime computer vision. These metrics indicate that existing computer vision approaches are indeed promising for maritime computer vision and can play a foundational role in the emerging field of maritime computer vision.