CVSep 26, 2023
SEPT: Towards Efficient Scene Representation Learning for Motion PredictionZhiqian Lan, Yuxuan Jiang, Yao Mu et al.
Motion prediction is crucial for autonomous vehicles to operate safely in complex traffic environments. Extracting effective spatiotemporal relationships among traffic elements is key to accurate forecasting. Inspired by the successful practice of pretrained large language models, this paper presents SEPT, a modeling framework that leverages self-supervised learning to develop powerful spatiotemporal understanding for complex traffic scenes. Specifically, our approach involves three masking-reconstruction modeling tasks on scene inputs including agents' trajectories and road network, pretraining the scene encoder to capture kinematics within trajectory, spatial structure of road network, and interactions among roads and agents. The pretrained encoder is then finetuned on the downstream forecasting task. Extensive experiments demonstrate that SEPT, without elaborate architectural design or manual feature engineering, achieves state-of-the-art performance on the Argoverse 1 and Argoverse 2 motion forecasting benchmarks, outperforming previous methods on all main metrics by a large margin.
ROApr 19Code
MM-Hand: A 21-DOF Multi-modal Modular Dexterous Robotic Hand with Remote ActuationZhuoheng Li, Qingquan Lin, Checheng Yu et al.
High-DOF dexterous hands require compact actuation, rich sensing, and reliable thermal behavior, but conventional designs often occupy valuable in-hand space, increase end-effector mass, and suffer from heat accumulation near the hand. Remote tendon-driven actuation offers an alternative by relocating motors to the robot base or an external motor hub, thereby freeing the fingers and palm for additional degrees of freedom, sensing modules, and maintainable mechanical structures. This paper presents MM-Hand, a 21-DOF Multimodal Modular dexterous hand based on remote tendon-driven actuation. The hand integrates spring-return tendon-driven fingers, modular 3D-printed finger and palm structures, quick tendon connectors for maintenance, and a multimodal sensing system including joint angle sensors, tactile sensors, motor-side feedback, and in-palm stereo vision. We further analyze tendon-sheath length variation and friction loss to guide the design of the routing, motor hub, and closed-loop joint control. Experiments validate the transmission, output force, sensing, and control capability of the system. The fingertip force reaches 25N under a 1m remote sheath transmission, demonstrating practical load capacity despite long-distance tendon routing. Closed-loop joint-level experiments further evaluate command tracking with a static arm and during arm motion. These results show that MM-Hand provides a lightweight, sensor-rich, and maintainable hardware platform for dexterous manipulation research. To support the community, all hardware designs and software frameworks are made fully open-source at https://mmlab.hk/research/MM-Hand.
ROApr 17, 2025Code
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital TwinsYao Mu, Tianxing Chen, Zanxin Chen et al.
In the rapidly advancing field of robotics, dual-arm coordination and complex object manipulation are essential capabilities for developing advanced autonomous systems. However, the scarcity of diverse, high-quality demonstration data and real-world-aligned evaluation benchmarks severely limits such development. To address this, we introduce RoboTwin, a generative digital twin framework that uses 3D generative foundation models and large language models to produce diverse expert datasets and provide a real-world-aligned evaluation platform for dual-arm robotic tasks. Specifically, RoboTwin creates varied digital twins of objects from single 2D images, generating realistic and interactive scenarios. It also introduces a spatial relation-aware code generation framework that combines object annotations with large language models to break down tasks, determine spatial constraints, and generate precise robotic movement code. Our framework offers a comprehensive benchmark with both simulated and real-world data, enabling standardized evaluation and better alignment between simulated training and real-world performance. We validated our approach using the open-source COBOT Magic Robot platform. Policies pre-trained on RoboTwin-generated data and fine-tuned with limited real-world samples demonstrate significant potential for enhancing dual-arm robotic manipulation systems by improving success rates by over 70% for single-arm tasks and over 40% for dual-arm tasks compared to models trained solely on real-world data.
LGJul 21, 2024
Rocket Landing Control with Random Annealing Jump Start Reinforcement LearningYuxuan Jiang, Yujie Yang, Zhiqian Lan et al.
Rocket recycling is a crucial pursuit in aerospace technology, aimed at reducing costs and environmental impact in space exploration. The primary focus centers on rocket landing control, involving the guidance of a nonlinear underactuated rocket with limited fuel in real-time. This challenging task prompts the application of reinforcement learning (RL), yet goal-oriented nature of the problem poses difficulties for standard RL algorithms due to the absence of intermediate reward signals. This paper, for the first time, significantly elevates the success rate of rocket landing control from 8% with a baseline controller to 97% on a high-fidelity rocket model using RL. Our approach, called Random Annealing Jump Start (RAJS), is tailored for real-world goal-oriented problems by leveraging prior feedback controllers as guide policy to facilitate environmental exploration and policy learning in RL. In each episode, the guide policy navigates the environment for the guide horizon, followed by the exploration policy taking charge to complete remaining steps. This jump-start strategy prunes exploration space, rendering the problem more tractable to RL algorithms. The guide horizon is sampled from a uniform distribution, with its upper bound annealing to zero based on performance metrics, mitigating distribution shift and mismatch issues in existing methods. Additional enhancements, including cascading jump start, refined reward and terminal condition, and action smoothness regulation, further improve policy performance and practical applicability. The proposed method is validated through extensive evaluation and Hardware-in-the-Loop testing, affirming the effectiveness, real-time feasibility, and smoothness of the proposed controller.