Zichen He

RO
h-index12
6papers
148citations
Novelty42%
AI Score44

6 Papers

ROMar 23
BiPreManip: Learning Affordance-Based Bimanual Preparatory Manipulation through Anticipatory Collaboration

Yan Shen, Feng Jiang, Zichen He et al.

Many everyday objects are difficult to directly grasp (e.g., a flat iPad) or manipulate functionally (e.g., opening the cap of a pen lying on a desk). Such tasks require sequential, asymmetric coordination between two arms, where one arm performs preparatory manipulation that enables the other's goal-directed action - for instance, pushing the iPad to the table's edge before picking it up, or lifting the pen body to allow the other hand to remove its cap. In this work, we introduce Collaborative Preparatory Manipulation, a class of bimanual manipulation tasks that demand understanding object semantics and geometry, anticipating spatial relationships, and planning long-horizon coordinated actions between the two arms. To tackle this challenge, we propose a visual affordance-based framework that first envisions the final goal-directed action and then guides one arm to perform a sequence of preparatory manipulations that facilitate the other arm's subsequent operation. This affordance-centric representation enables anticipatory inter-arm reasoning and coordination, generalizing effectively across various objects spanning diverse categories. Extensive experiments in both simulation and the real world demonstrate that our approach substantially improves task success rates and generalization compared to competitive baselines.

RONov 29, 2022
Multi-robot Social-aware Cooperative Planning in Pedestrian Environments Using Multi-agent Reinforcement Learning

Zichen He, Chunwei Song, Lu Dong

Safe and efficient co-planning of multiple robots in pedestrian participation environments is promising for applications. In this work, a novel multi-robot social-aware efficient cooperative planner that on the basis of off-policy multi-agent reinforcement learning (MARL) under partial dimension-varying observation and imperfect perception conditions is proposed. We adopt temporal-spatial graph (TSG)-based social encoder to better extract the importance of social relation between each robot and the pedestrians in its field of view (FOV). Also, we introduce K-step lookahead reward setting in multi-robot RL framework to avoid aggressive, intrusive, short-sighted, and unnatural motion decisions generated by robots. Moreover, we improve the traditional centralized critic network with multi-head global attention module to better aggregates local observation information among different robots to guide the process of individual policy update. Finally, multi-group experimental results verify the effectiveness of the proposed cooperative motion planner.

ROJun 23, 2025
MinD: Learning A Dual-System World Model for Real-Time Planning and Implicit Risk Analysis

Xiaowei Chi, Kuangzhi Ge, Jiaming Liu et al.

Video Generation Models (VGMs) have become powerful backbones for Vision-Language-Action (VLA) models, leveraging large-scale pretraining for robust dynamics modeling. However, current methods underutilize their distribution modeling capabilities for predicting future states. Two challenges hinder progress: integrating generative processes into feature learning is both technically and conceptually underdeveloped, and naive frame-by-frame video diffusion is computationally inefficient for real-time robotics. To address these, we propose Manipulate in Dream (MinD), a dual-system world model for real-time, risk-aware planning. MinD uses two asynchronous diffusion processes: a low-frequency visual generator (LoDiff) that predicts future scenes and a high-frequency diffusion policy (HiDiff) that outputs actions. Our key insight is that robotic policies do not require fully denoised frames but can rely on low-resolution latents generated in a single denoising step. To connect early predictions to actions, we introduce DiffMatcher, a video-action alignment module with a novel co-training strategy that synchronizes the two diffusion models. MinD achieves a 63% success rate on RL-Bench, 60% on real-world Franka tasks, and operates at 11.3 FPS, demonstrating the efficiency of single-step latent features for control signals. Furthermore, MinD identifies 74% of potential task failures in advance, providing real-time safety signals for monitoring and intervention. This work establishes a new paradigm for efficient and reliable robotic manipulation using generative world models.

NCJul 3, 2025
NLP4Neuro: Sequence-to-sequence learning for neural population decoding

Jacob J. Morra, Kaitlyn E. Fouke, Kexin Hang et al.

Delineating how animal behavior arises from neural activity is a foundational goal of neuroscience. However, as the computations underlying behavior unfold in networks of thousands of individual neurons across the entire brain, this presents challenges for investigating neural roles and computational mechanisms in large, densely wired mammalian brains during behavior. Transformers, the backbones of modern large language models (LLMs), have become powerful tools for neural decoding from smaller neural populations. These modern LLMs have benefited from extensive pre-training, and their sequence-to-sequence learning has been shown to generalize to novel tasks and data modalities, which may also confer advantages for neural decoding from larger, brain-wide activity recordings. Here, we present a systematic evaluation of off-the-shelf LLMs to decode behavior from brain-wide populations, termed NLP4Neuro, which we used to test LLMs on simultaneous calcium imaging and behavior recordings in larval zebrafish exposed to visual motion stimuli. Through NLP4Neuro, we found that LLMs become better at neural decoding when they use pre-trained weights learned from textual natural language data. Moreover, we found that a recent mixture-of-experts LLM, DeepSeek Coder-7b, significantly improved behavioral decoding accuracy, predicted tail movements over long timescales, and provided anatomically consistent highly interpretable readouts of neuron salience. NLP4Neuro demonstrates that LLMs are highly capable of informing brain-wide neural circuit dissection.

RODec 13, 2021
Multi-agent Soft Actor-Critic Based Hybrid Motion Planner for Mobile Robots

Zichen He, Lu Dong, Chunwei Song et al.

In this paper, a novel hybrid multi-robot motion planner that can be applied under non-communication and local observable conditions is presented. The planner is model-free and can realize the end-to-end mapping of multi-robot state and observation information to final smooth and continuous trajectories. The planner is a front-end and back-end separated architecture. The design of the front-end collaborative waypoints searching module is based on the multi-agent soft actor-critic algorithm under the centralized training with decentralized execution diagram. The design of the back-end trajectory optimization module is based on the minimal snap method with safety zone constraints. This module can output the final dynamic-feasible and executable trajectories. Finally, multi-group experimental results verify the effectiveness of the proposed motion planner.

ROAug 31, 2021
A review of mobile robot motion planning methods: from classical motion planning workflows to reinforcement learning-based architectures

Lu Dong, Zichen He, Chunwei Song et al.

Motion planning is critical to realize the autonomous operation of mobile robots. As the complexity and randomness of robot application scenarios increase, the planning capability of the classical hierarchical motion planners is challenged. With the development of machine learning, deep reinforcement learning (DRL)-based motion planner has gradually become a research hotspot due to its several advantageous features. DRL-based motion planner is model-free and does not rely on the prior structured map. Most importantly, DRL-based motion planner achieves the unification of the global planner and the local planner. In this paper, we provide a systematic review of various motion planning methods. First, we summarize the representative and state-of-the-art works for each submodule of the classical motion planning architecture and analyze their performance features. Subsequently, we concentrate on summarizing RL-based motion planning approaches, including motion planners combined with RL improvements, map-free RL-based motion planners, and multi-robot cooperative planning methods. Last but not least, we analyze the urgent challenges faced by these mainstream RL-based motion planners in detail, review some state-of-the-art works for these issues, and propose suggestions for future research.