Shiqi Zhao

RO
h-index72
14papers
1,511citations
Novelty45%
AI Score47

14 Papers

ROSep 9, 2022Code
General Place Recognition Survey: Towards the Real-world Autonomy Age

Peng Yin, Shiqi Zhao, Ivan Cisneros et al. · cmu

Place recognition is the fundamental module that can assist Simultaneous Localization and Mapping (SLAM) in loop-closure detection and re-localization for long-term navigation. The place recognition community has made astonishing progress over the last $20$ years, and this has attracted widespread research interest and application in multiple fields such as computer vision and robotics. However, few methods have shown promising place recognition performance in complex real-world scenarios, where long-term and large-scale appearance changes usually result in failures. Additionally, there is a lack of an integrated framework amongst the state-of-the-art methods that can handle all of the challenges in place recognition, which include appearance changes, viewpoint differences, robustness to unknown areas, and efficiency in real-world applications. In this work, we survey the state-of-the-art methods that target long-term localization and discuss future directions and opportunities. We start by investigating the formulation of place recognition in long-term autonomy and the major challenges in real-world environments. We then review the recent works in place recognition for different sensor modalities and current strategies for dealing with various place recognition challenges. Finally, we review the existing datasets for long-term localization and introduce our datasets and evaluation API for different approaches. This paper can be a tutorial for researchers new to the place recognition community and those who care about long-term robotics autonomy. We also provide our opinion on the frequently asked question in robotics: Do robots need accurate localization for long-term autonomy? A summary of this work and our datasets and evaluation API is publicly available to the robotics community at: https://github.com/MetaSLAM/GPRS.

ROAug 30, 2022
BioSLAM: A Bio-inspired Lifelong Memory System for General Place Recognition

Peng Yin, Abulikemu Abuduweili, Shiqi Zhao et al. · cmu

We present BioSLAM, a lifelong SLAM framework for learning various new appearances incrementally and maintaining accurate place recognition for previously visited areas. Unlike humans, artificial neural networks suffer from catastrophic forgetting and may forget the previously visited areas when trained with new arrivals. For humans, researchers discover that there exists a memory replay mechanism in the brain to keep the neuron active for previous events. Inspired by this discovery, BioSLAM designs a gated generative replay to control the robot's learning behavior based on the feedback rewards. Specifically, BioSLAM provides a novel dual-memory mechanism for maintenance: 1) a dynamic memory to efficiently learn new observations and 2) a static memory to balance new-old knowledge. When combined with a visual-/LiDAR- based SLAM system, the complete processing pipeline can help the agent incrementally update the place recognition ability, robust to the increasing complexity of long-term place recognition. We demonstrate BioSLAM in two incremental SLAM scenarios. In the first scenario, a LiDAR-based agent continuously travels through a city-scale environment with a 120km trajectory and encounters different types of 3D geometries (open streets, residential areas, commercial buildings). We show that BioSLAM can incrementally update the agent's place recognition ability and outperform the state-of-the-art incremental approach, Generative Replay, by 24%. In the second scenario, a LiDAR-vision-based agent repeatedly travels through a campus-scale area on a 4.5km trajectory. BioSLAM can guarantee the place recognition accuracy to outperform 15\% over the state-of-the-art approaches under different appearances. To our knowledge, BioSLAM is the first memory-enhanced lifelong SLAM system to help incremental place recognition in long-term navigation tasks.

ROJul 14, 2022
AutoMerge: A Framework for Map Assembling and Smoothing in City-scale Environments

Peng Yin, Haowen Lai, Shiqi Zhao et al.

We present AutoMerge, a LiDAR data processing framework for assembling a large number of map segments into a complete map. Traditional large-scale map merging methods are fragile to incorrect data associations, and are primarily limited to working only offline. AutoMerge utilizes multi-perspective fusion and adaptive loop closure detection for accurate data associations, and it uses incremental merging to assemble large maps from individual trajectory segments given in random order and with no initial estimations. Furthermore, after assembling the segments, AutoMerge performs fine matching and pose-graph optimization to globally smooth the merged map. We demonstrate AutoMerge on both city-scale merging (120km) and campus-scale repeated merging (4.5km x 8). The experiments show that AutoMerge (i) surpasses the second- and third- best methods by 14% and 24% recall in segment retrieval, (ii) achieves comparable 3D mapping accuracy for 120 km large-scale map assembly, (iii) and it is robust to temporally-spaced revisits. To the best of our knowledge, AutoMerge is the first mapping approach that can merge hundreds of kilometers of individual segments without the aid of GPS.

CVJul 6, 2022
SphereVLAD++: Attention-based and Signal-enhanced Viewpoint Invariant Descriptor

Shiqi Zhao, Peng Yin, Ge Yi et al.

LiDAR-based localization approach is a fundamental module for large-scale navigation tasks, such as last-mile delivery and autonomous driving, and localization robustness highly relies on viewpoints and 3D feature extraction. Our previous work provides a viewpoint-invariant descriptor to deal with viewpoint differences; however, the global descriptor suffers from a low signal-noise ratio in unsupervised clustering, reducing the distinguishable feature extraction ability. We develop SphereVLAD++, an attention-enhanced viewpoint invariant place recognition method in this work. SphereVLAD++ projects the point cloud on the spherical perspective for each unique area and captures the contextual connections between local features and their dependencies with global 3D geometry distribution. In return, clustered elements within the global descriptor are conditioned on local and global geometries and support the original viewpoint-invariant property of SphereVLAD. In the experiments, we evaluated the localization performance of SphereVLAD++ on both public KITTI360 datasets and self-generated datasets from the city of Pittsburgh. The experiment results show that SphereVLAD++ outperforms all relative state-of-the-art 3D place recognition methods under small or even totally reversed viewpoint differences and shows 0.69% and 15.81% successful retrieval rates with better than the second best. Low computation requirements and high time efficiency also help its application for low-cost robots.

SPJun 8, 2022
Binary Single-dimensional Convolutional Neural Network for Seizure Prediction

Shiqi Zhao, Jie Yang, Yankun Xu et al.

Nowadays, several deep learning methods are proposed to tackle the challenge of epileptic seizure prediction. However, these methods still cannot be implemented as part of implantable or efficient wearable devices due to their large hardware and corresponding high-power consumption. They usually require complex feature extraction process, large memory for storing high precision parameters and complex arithmetic computation, which greatly increases required hardware resources. Moreover, available yield poor prediction performance, because they adopt network architecture directly from image recognition applications fails to accurately consider the characteristics of EEG signals. We propose in this paper a hardware-friendly network called Binary Single-dimensional Convolutional Neural Network (BSDCNN) intended for epileptic seizure prediction. BSDCNN utilizes 1D convolutional kernels to improve prediction performance. All parameters are binarized to reduce the required computation and storage, except the first layer. Overall area under curve, sensitivity, and false prediction rate reaches 0.915, 89.26%, 0.117/h and 0.970, 94.69%, 0.095/h on American Epilepsy Society Seizure Prediction Challenge (AES) dataset and the CHB-MIT one respectively. The proposed architecture outperforms recent works while offering 7.2 and 25.5 times reductions on the size of parameter and computation, respectively.

ROSep 22, 2022
MUI-TARE: Multi-Agent Cooperative Exploration with Unknown Initial Position

Jingtian Yan, Xingqiao Lin, Zhongqiang Ren et al.

Multi-agent exploration of a bounded 3D environment with unknown initial positions of agents is a challenging problem. It requires quickly exploring the environments as well as robustly merging the sub-maps built by the agents. We take the view that the existing approaches are either aggressive or conservative: Aggressive strategies merge two sub-maps built by different agents together when overlap is detected, which can lead to incorrect merging due to the false-positive detection of the overlap and is thus not robust. Conservative strategies direct one agent to revisit an excessive amount of the historical trajectory of another agent for verification before merging, which can lower the exploration efficiency due to the repeated exploration of the same space. To intelligently balance the robustness of sub-map merging and exploration efficiency, we develop a new approach for lidar-based multi-agent exploration, which can direct one agent to repeat another agent's trajectory in an \emph{adaptive} manner based on the quality indicator of the sub-map merging process. Additionally, our approach extends the recent single-agent hierarchical exploration strategy to multiple agents in a \emph{cooperative} manner by planning for agents with merged sub-maps together to further improve exploration efficiency. Our experiments show that our approach is up to 50\% more efficient than the baselines on average while merging sub-maps robustly.

ROMay 8, 2024Code
General Place Recognition Survey: Towards Real-World Autonomy

Peng Yin, Jianhao Jiao, Shiqi Zhao et al.

In the realm of robotics, the quest for achieving real-world autonomy, capable of executing large-scale and long-term operations, has positioned place recognition (PR) as a cornerstone technology. Despite the PR community's remarkable strides over the past two decades, garnering attention from fields like computer vision and robotics, the development of PR methods that sufficiently support real-world robotic systems remains a challenge. This paper aims to bridge this gap by highlighting the crucial role of PR within the framework of Simultaneous Localization and Mapping (SLAM) 2.0. This new phase in robotic navigation calls for scalable, adaptable, and efficient PR solutions by integrating advanced artificial intelligence (AI) technologies. For this goal, we provide a comprehensive review of the current state-of-the-art (SOTA) advancements in PR, alongside the remaining challenges, and underscore its broad applications in robotics. This paper begins with an exploration of PR's formulation and key research challenges. We extensively review literature, focusing on related methods on place representation and solutions to various PR challenges. Applications showcasing PR's potential in robotics, key PR datasets, and open-source libraries are discussed. We conclude with a discussion on PR's future directions and provide a summary of the literature covered at: https://github.com/MetaSLAM/GPRS.

70.6ROMar 23
Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control

Qingrui Zhao, Kaiyue Yang, Xiyu Wang et al.

Humanoid robots require diverse motor skills to integrate into complex environments, but bridging the kinematic and dynamic embodiment gap from human data remains a major bottleneck. We demonstrate through Hessian analysis that traditional optimization-based retargeting is inherently non-convex and prone to local optima, leading to physical artifacts like joint jumps and self-penetration. To address this, we reformulate the targeting problem as learning data distribution rather than optimizing optimal solutions, where we propose NMR, a Neural Motion Retargeting framework that transforms static geometric mapping into a dynamics-aware learned process. We first propose Clustered-Expert Physics Refinement (CEPR), a hierarchical data pipeline that leverages VAE-based motion clustering to group heterogeneous movements into latent motifs. This strategy significantly reduces the computational overhead of massively parallel reinforcement learning experts, which project and repair noisy human demonstrations onto the robot's feasible motion manifold. The resulting high-fidelity data supervises a non-autoregressive CNN-Transformer architecture that reasons over global temporal context to suppress reconstruction noise and bypass geometric traps. Experiments on the Unitree G1 humanoid across diverse dynamic tasks (e.g., martial arts, dancing) show that NMR eliminates joint jumps and significantly reduces self-collisions compared to state-of-the-art baselines. Furthermore, NMR-generated references accelerate the convergence of downstream whole-body control policies, establishing a scalable path for bridging the human-robot embodiment gap.

LGNov 16, 2025Code
Personality-guided Public-Private Domain Disentangled Hypergraph-Former Network for Multimodal Depression Detection

Changzeng Fu, Shiwen Zhao, Yunze Zhang et al.

Depression represents a global mental health challenge requiring efficient and reliable automated detection methods. Current Transformer- or Graph Neural Networks (GNNs)-based multimodal depression detection methods face significant challenges in modeling individual differences and cross-modal temporal dependencies across diverse behavioral contexts. Therefore, we propose P$^3$HF (Personality-guided Public-Private Domain Disentangled Hypergraph-Former Network) with three key innovations: (1) personality-guided representation learning using LLMs to transform discrete individual features into contextual descriptions for personalized encoding; (2) Hypergraph-Former architecture modeling high-order cross-modal temporal relationships; (3) event-level domain disentanglement with contrastive learning for improved generalization across behavioral contexts. Experiments on MPDD-Young dataset show P$^3$HF achieves around 10\% improvement on accuracy and weighted F1 for binary and ternary depression classification task over existing methods. Extensive ablation studies validate the independent contribution of each architectural component, confirming that personality-guided representation learning and high-order hypergraph reasoning are both essential for generating robust, individual-aware depression-related representations. The code is released at https://github.com/hacilab/P3HF.

CLMar 31, 2022Code
PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained Language Model

Fei Mi, Yitong Li, Yulong Zeng et al.

In this paper, we introduce PanGu-Bot, a Chinese pre-trained open-domain dialogue generation model based on a large pre-trained language model (PLM) PANGU-alpha (Zeng et al.,2021). Different from other pre-trained dialogue models trained over a massive amount of dialogue data from scratch, we aim to build a powerful dialogue model with relatively fewer data and computation costs by inheriting valuable language capabilities and knowledge from PLMs. To this end, we train PanGu-Bot from the large PLM PANGU-alpha, which has been proven well-performed on a variety of Chinese natural language tasks. We investigate different aspects of responses generated by PanGu-Bot, including response quality, knowledge, and safety. We show that PanGu-Bot outperforms state-of-the-art Chinese dialogue systems (CDIALGPT (Wang et al., 2020), EVA (Zhou et al., 2021), EVA2.0 (Gu et al., 2022)) w.r.t. the above three aspects. We also demonstrate that PanGu-Bot can be easily deployed to generate emotional responses without further training. Throughout our empirical analysis, we also point out that the PanGu-Bot response quality, knowledge correctness, and safety are still far from perfect, and further explorations are indispensable to building reliable and smart dialogue systems. Our model and code will be available at https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/PanGu-Bot soon.

AIMay 15, 2025
The First MPDD Challenge: Multimodal Personality-aware Depression Detection

Changzeng Fu, Zelin Fu, Qi Zhang et al.

Depression is a widespread mental health issue affecting diverse age groups, with notable prevalence among college students and the elderly. However, existing datasets and detection methods primarily focus on young adults, neglecting the broader age spectrum and individual differences that influence depression manifestation. Current approaches often establish a direct mapping between multimodal data and depression indicators, failing to capture the complexity and diversity of depression across individuals. This challenge includes two tracks based on age-specific subsets: Track 1 uses the MPDD-Elderly dataset for detecting depression in older adults, and Track 2 uses the MPDD-Young dataset for detecting depression in younger participants. The Multimodal Personality-aware Depression Detection (MPDD) Challenge aims to address this gap by incorporating multimodal data alongside individual difference factors. We provide a baseline model that fuses audio and video modalities with individual difference information to detect depression manifestations in diverse populations. This challenge aims to promote the development of more personalized and accurate de pression detection methods, advancing mental health research and fostering inclusive detection systems. More details are available on the official challenge website: https://hacilab.github.io/MPDDChallenge.github.io.

SPAug 17, 2021
An End-to-End Deep Learning Approach for Epileptic Seizure Prediction

Yankun Xu, Jie Yang, Shiqi Zhao et al.

An accurate seizure prediction system enables early warnings before seizure onset of epileptic patients. It is extremely important for drug-refractory patients. Conventional seizure prediction works usually rely on features extracted from Electroencephalography (EEG) recordings and classification algorithms such as regression or support vector machine (SVM) to locate the short time before seizure onset. However, such methods cannot achieve high-accuracy prediction due to information loss of the hand-crafted features and the limited classification ability of regression and SVM algorithms. We propose an end-to-end deep learning solution using a convolutional neural network (CNN) in this paper. One and two dimensional kernels are adopted in the early- and late-stage convolution and max-pooling layers, respectively. The proposed CNN model is evaluated on Kaggle intracranial and CHB-MIT scalp EEG datasets. Overall sensitivity, false prediction rate, and area under receiver operating characteristic curve reaches 93.5%, 0.063/h, 0.981 and 98.8%, 0.074/h, 0.988 on two datasets respectively. Comparison with state-of-the-art works indicates that the proposed model achieves exceeding prediction performance.

NEFeb 25, 2021
A New Neuromorphic Computing Approach for Epileptic Seizure Prediction

Fengshi Tian, Jie Yang, Shiqi Zhao et al.

Several high specificity and sensitivity seizure prediction methods with convolutional neural networks (CNNs) are reported. However, CNNs are computationally expensive and power hungry. These inconveniences make CNN-based methods hard to be implemented on wearable devices. Motivated by the energy-efficient spiking neural networks (SNNs), a neuromorphic computing approach for seizure prediction is proposed in this work. This approach uses a designed gaussian random discrete encoder to generate spike sequences from the EEG samples and make predictions in a spiking convolutional neural network (Spiking-CNN) which combines the advantages of CNNs and SNNs. The experimental results show that the sensitivity, specificity and AUC can remain 95.1%, 99.2% and 0.912 respectively while the computation complexity is reduced by 98.58% compared to CNN, indicating that the proposed Spiking-CNN is hardware friendly and of high precision.

CLNov 14, 2017
DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Wei He, Kai Liu, Jing Liu et al.

This paper introduces DuReader, a new large-scale, open-domain Chinese ma- chine reading comprehension (MRC) dataset, designed to address real-world MRC. DuReader has three advantages over previous MRC datasets: (1) data sources: questions and documents are based on Baidu Search and Baidu Zhidao; answers are manually generated. (2) question types: it provides rich annotations for more question types, especially yes-no and opinion questions, that leaves more opportunity for the research community. (3) scale: it contains 200K questions, 420K answers and 1M documents; it is the largest Chinese MRC dataset so far. Experiments show that human performance is well above current state-of-the-art baseline systems, leaving plenty of room for the community to make improvements. To help the community make these improvements, both DuReader and baseline systems have been posted online. We also organize a shared competition to encourage the exploration of more models. Since the release of the task, there are significant improvements over the baselines.