Zirui Li

RO
h-index40
50papers
1,975citations
Novelty42%
AI Score57

50 Papers

QUANT-PHOct 30, 2022
QuEst: Graph Transformer for Quantum Circuit Reliability Estimation

Hanrui Wang, Pengyu Liu, Jinglei Cheng et al. · mit

Among different quantum algorithms, PQC for QML show promises on near-term devices. To facilitate the QML and PQC research, a recent python library called TorchQuantum has been released. It can construct, simulate, and train PQC for machine learning tasks with high speed and convenient debugging supports. Besides quantum for ML, we want to raise the community's attention on the reversed direction: ML for quantum. Specifically, the TorchQuantum library also supports using data-driven ML models to solve problems in quantum system research, such as predicting the impact of quantum noise on circuit fidelity and improving the quantum circuit compilation efficiency. This paper presents a case study of the ML for quantum part. Since estimating the noise impact on circuit reliability is an essential step toward understanding and mitigating noise, we propose to leverage classical ML to predict noise impact on circuit fidelity. Inspired by the natural graph representation of quantum circuits, we propose to leverage a graph transformer model to predict the noisy circuit fidelity. We firstly collect a large dataset with a variety of quantum circuits and obtain their fidelity on noisy simulators and real machines. Then we embed each circuit into a graph with gate and noise properties as node features, and adopt a graph transformer to predict the fidelity. Evaluated on 5 thousand random and algorithm circuits, the graph transformer predictor can provide accurate fidelity estimation with RMSE error 0.04 and outperform a simple neural network-based model by 0.02 on average. It can achieve 0.99 and 0.95 R$^2$ scores for random and algorithm circuits, respectively. Compared with circuit simulators, the predictor has over 200X speedup for estimating the fidelity.

CVNov 18, 2022Code
Leveraging Multi-stream Information Fusion for Trajectory Prediction in Low-illumination Scenarios: A Multi-channel Graph Convolutional Approach

Hailong Gong, Zirui Li, Chao Lu et al.

Trajectory prediction is a fundamental problem and challenge for autonomous vehicles. Early works mainly focused on designing complicated architectures for deep-learning-based prediction models in normal-illumination environments, which fail in dealing with low-light conditions. This paper proposes a novel approach for trajectory prediction in low-illumination scenarios by leveraging multi-stream information fusion, which flexibly integrates image, optical flow, and object trajectory information. The image channel employs Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM) networks to extract temporal information from the camera. The optical flow channel is applied to capture the pattern of relative motion between adjacent camera frames and modelled by Spatial-Temporal Graph Convolutional Network (ST-GCN). The trajectory channel is used to recognize high-level interactions between vehicles. Finally, information from all the three channels is effectively fused in the prediction module to generate future trajectories of surrounding vehicles in low-illumination conditions. The proposed multi-channel graph convolutional approach is validated on HEV-I and newly generated Dark-HEV-I, egocentric vision datasets that primarily focus on urban intersection scenarios. The results demonstrate that our method outperforms the baselines, in standard and low-illumination scenarios. Additionally, our approach is generic and applicable to scenarios with different types of perception data. The source code of the proposed approach is available at https://github.com/TommyGong08/MSIF}{https://github.com/TommyGong08/MSIF.

ARMay 31
Linear Complexity Fermionic Simulation on Quantum Devices with Hardware Connectivity Constraints

Xiangyu Gao, Winston Li, Jiakang Li et al.

Simulating fermionic systems on quantum hardware requires compiling fermionic Hamiltonians into executable quantum circuits. Existing approaches treat each compilation stage independently, applying heuristics with localized objectives that produce circuits with superquartic gate count and depth scaling and compilation times reaching several hours for large instances. We present Accordion, an end-to-end framework that co-designs the fermion-to-qubit mapping with circuit synthesis and hardware routing. Accordion fixes the Jordan Wigner mapping, which despite its higher Pauli weight produces Pauli operators with structural regularity that enables provably efficient circuit generation. For full-rank all-to-all electronic structure Hamiltonians, we prove O(N^4) gate count and circuit depth, matching the information-theoretic lower bound imposed by the Theta(N^4) second excitation terms. On linear, IBM heavy-hex, and square-grid architectures, Accordion reduces gate count by up to 79% and circuit depth by up to 77% relative to the best baseline.

CVMar 18
Video Understanding: From Geometry and Semantics to Unified Models

Zhaochong An, Zirui Li, Mingqiao Ye et al. · cambridge

Video understanding aims to enable models to perceive, reason about, and interact with the dynamic visual world. In contrast to image understanding, video understanding inherently requires modeling temporal dynamics and evolving visual context, placing stronger demands on spatiotemporal reasoning and making it a foundational problem in computer vision. In this survey, we present a structured overview of video understanding by organizing the literature into three complementary perspectives: low-level video geometry understanding, high-level semantic understanding, and unified video understanding models. We further highlight a broader shift from isolated, task-specific pipelines toward unified modeling paradigms that can be adapted to diverse downstream objectives, enabling a more systematic view of recent progress. By consolidating these perspectives, this survey provides a coherent map of the evolving video understanding landscape, summarizes key modeling trends and design principles, and outlines open challenges toward building robust, scalable, and unified video foundation models.

CVMay 21Code
JMed48k: A Multi-Profession Japanese Medical Licensing Benchmark for Vision-Language Model Evaluation

Yue Xun, Junyu Liu, Qian Niu et al.

We introduce JMed48k, a multi-profession Japanese healthcare licensing benchmark for evaluating vision-language models. Built from official PDF materials released by the Japanese Ministry of Health, Labour and Welfare, JMed48k contains 48,862 exam questions and 20,142 images from 11 national licensing examinations between 2005 and 2025, with visual content annotated under an 8-type taxonomy. From this corpus, we derive JMed48k-Eval, a recent five-year evaluation subset with 12,484 scored questions, including 9,905 text-only questions and 2,579 questions with images. We evaluate 21 proprietary, open-source, and medical-specific models, reporting text-only and with-image performance separately. Because these subsets contain different questions, we further introduce a paired image-removal audit that evaluates questions with images before and after removing visual content to explore four answer-transition states. The audit shows that proprietary and open source models gain substantially from images, whereas medical-specific systems show limited observable use of visual evidence, with many correct answers persisting after image removal. Even among proprietary models, the net image-removal effect varies sevenfold across professions, from +5.7 points on Physician questions to +39.8 points on Public Health Nurse questions. We release JMed48k to support reproducible, profession-stratified evaluation of vision-language models in medical licensing settings.

ROJul 24, 2022
Adaptive Decision Making at the Intersection for Autonomous Vehicles Based on Skill Discovery

Xianqi He, Lin Yang, Chao Lu et al.

In urban environments, the complex and uncertain intersection scenarios are challenging for autonomous driving. To ensure safety, it is crucial to develop an adaptive decision making system that can handle the interaction with other vehicles. Manually designed model-based methods are reliable in common scenarios. But in uncertain environments, they are not reliable, so learning-based methods are proposed, especially reinforcement learning (RL) methods. However, current RL methods need retraining when the scenarios change. In other words, current RL methods cannot reuse accumulated knowledge. They forget learned knowledge when new scenarios are given. To solve this problem, we propose a hierarchical framework that can autonomously accumulate and reuse knowledge. The proposed method combines the idea of motion primitives (MPs) with hierarchical reinforcement learning (HRL). It decomposes complex problems into multiple basic subtasks to reduce the difficulty. The proposed method and other baseline methods are tested in a challenging intersection scenario based on the CARLA simulator. The intersection scenario contains three different subtasks that can reflect the complexity and uncertainty of real traffic flow. After offline learning and testing, the proposed method is proved to have the best performance among all methods.

QUANT-PHNov 27, 2023
RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training

Hanrui Wang, Yilian Liu, Pengyu Liu et al.

Quantum state preparation, a crucial subroutine in quantum computing, involves generating a target quantum state from initialized qubits. Arbitrary state preparation algorithms can be broadly categorized into arithmetic decomposition (AD) and variational quantum state preparation (VQSP). AD employs a predefined procedure to decompose the target state into a series of gates, whereas VQSP iteratively tunes ansatz parameters to approximate target state. VQSP is particularly apt for Noisy-Intermediate Scale Quantum (NISQ) machines due to its shorter circuits. However, achieving noise-robust parameter optimization still remains challenging. We present RobustState, a novel VQSP training methodology that combines high robustness with high training efficiency. The core idea involves utilizing measurement outcomes from real machines to perform back-propagation through classical simulators, thus incorporating real quantum noise into gradient calculations. RobustState serves as a versatile, plug-and-play technique applicable for training parameters from scratch or fine-tuning existing parameters to enhance fidelity on target machines. It is adaptable to various ansatzes at both gate and pulse levels and can even benefit other variational algorithms, such as variational unitary synthesis. Comprehensive evaluation of RobustState on state preparation tasks for 4 distinct quantum algorithms using 10 real quantum machines demonstrates a coherent error reduction of up to 7.1 $\times$ and state fidelity improvement of up to 96\% and 81\% for 4-Q and 5-Q states, respectively. On average, RobustState improves fidelity by 50\% and 72\% for 4-Q and 5-Q states compared to baseline approaches.

HCMar 15
Perceived risk evolution in automated driving inferred from large-scale discrete ratings

Xiaolin He, Zirui Li, Xinwei Wang et al.

Perceived risk in automated driving is often measured as discrete scores that summarise riding experience but this obscures volatile peaks from sustained elevation. Here we treat discrete clipwise ratings as constraints on an unobserved inferred evolution and apply a kernel constrained inverse model to infer the temporal evolution of perceived risk. Across 2,164 participants and 141,628 discrete clipwise ratings spanning 236 hours of scripted motorway interactions, we infer evolutions under kernel constraints whose shapes follow priors from independent handset-based ratings and whose timing is fixed by scripted manoeuvre markers. The inferred perceived risk evolutions differentiate accumulated perceived risk from within clip concentration, revealing scenario differences that are not identifiable from peak judgements alone. We then map these inferred evolutions from observable vehicle and relative motion cues under strict event level holdout using a deep neural network, enabling interpretable attribution analyses. Attribution shows distinct patterns between risk rising and falling segments, with a shift toward conflict cues in the rising phase, and a rebound toward stability cues in the falling phase. Attribution concentration increases only modestly at high perceived risk levels. These results move beyond treating perceived risk as a single severity score by characterising within episode dynamics and phase dependent cue associations in scripted motorway interactions.

AIMay 21
PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning

Lingyu Jiang, Zirui Li, Shuo Xing et al.

The emergence of Large Reasoning Language Models (LRMs) has paved the way for tackling complex reasoning tasks through test-time scaling by generating long-form Chain-of-Thought (CoT) trajectories during inference. Meanwhile, these trajectories often contain explicit reflection markers such as ``wait'', ``but'', and ``alternatively'', signaling hesitation, revision, and the consideration of alternative explorations, respectively. Recent studies on test-time control leverage such markers as lightweight handles for steering reasoning, typically treating them as a single coarse-grained category rather than distinguishing their distinct functional roles. In this paper, we conduct type-wise suppression and fixed-prefix intervention, revealing that reflection markers differ not only in their functional roles but also in when they exert the greatest influence. Specifically, different marker classes affect accuracy and generation length in distinct ways, and marker choices are most consequential before the model settles into a stable reasoning trajectory. Motivated by these findings, we introduce PathCal, a novel training-free decoding controller that calibrates reasoning paths by distinguishing marker types and intervening only at locally uncertain states. At each decoding step, PathCal utilizes the distribution over reflection-markers to estimate local competition between maintaining the current reasoning trajectory and initiating a competing branch, and softly rebalances marker logits when competing-branch evidence becomes excessive. Experiments across six reasoning benchmarks demonstrate that PathCal achieves a better efficiency--performance trade-off, improving or preserving accuracy while reducing generation length, without relying on external verifiers or additional sampling.

SYMay 3, 2022
Prediction-Based Reachability Analysis for Collision Risk Assessment on Highways

Xinwei Wang, Zirui Li, Javier Alonso-Mora et al.

Real-time safety systems are crucial components of intelligent vehicles. This paper introduces a prediction-based collision risk assessment approach on highways. Given a point mass vehicle dynamics system, a stochastic forward reachable set considering two-dimensional motion with vehicle state probability distributions is firstly established. We then develop an acceleration prediction model, which provides multi-modal probabilistic acceleration distributions to propagate vehicle states. The collision probability is calculated by summing up the probabilities of the states where two vehicles spatially overlap. Simulation results show that the prediction model has superior performance in terms of vehicle motion position errors, and the proposed collision detection approach is agile and effective to identify the collision in cut-in crash events.

AIFeb 9
Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure

Zirui Li, Xuefeng Bai, Kehai Chen et al.

Latent or continuous chain-of-thought methods replace explicit textual rationales with a number of internal latent steps, but these intermediate computations are difficult to evaluate beyond correlation-based probes. In this paper, we view latent chain-of-thought as a manipulable causal process in representation space by modeling latent steps as variables in a structural causal model (SCM) and analyzing their effects through step-wise $\mathrm{do}$-interventions. We study two representative paradigms (i.e., Coconut and CODI) on both mathematical and general reasoning tasks to investigate three key questions: (1) which steps are causally necessary for correctness and when answers become decidable early; (2) how does influence propagate across steps, and how does this structure compare to explicit CoT; and (3) do intermediate trajectories retain competing answer modes, and how does output-level commitment differ from representational commitment across steps. We find that latent-step budgets behave less like homogeneous extra depth and more like staged functionality with non-local routing, and we identify a persistent gap between early output bias and late representational commitment. These results motivate mode-conditional and stability-aware analyses -- and corresponding training/decoding objectives -- as more reliable tools for interpreting and improving latent reasoning systems.

HCApr 12
Adaptive Bounded-Rationality Modeling of Early-Stage Takeover in Shared-Control Driving

Jian Sun, Xiyan Jiang, Xiaocong Zhao et al.

Human drivers' control quality in the first seconds after a handover is critical to shared-driving safety; potentially unsafe steering or pedal inputs therefore require detection and correction by the automated vehicle's safety-fallback system. Yet performance in this window is vulnerable because cognitive states fluctuate rapidly, causing purely rationality-driven, cognition-unaware models to miss early control dynamics. We present an interpretable driver model grounded in bounded rationality with online adaptation that predicts early-stage control quality. We encode boundedness by embedding cognitive constraints in reinforcement learning and adapt latent cognitive parameters in real time via particle filtering from observations of driver actions. In a vehicle-in-the-loop study (n=41), we evaluated predictive performance and physiological validity. The adaptive model not only anticipated hazardous takeovers with higher coverage and longer lead times than non-adaptive baselines but also demonstrated strong alignment between inferred cognitive parameters and real-time eye-tracking metrics. These results confirm that the model captures genuine fluctuations in driver risk perception, enabling timely and cognitively grounded assistance.

AIMar 28
Beyond Completion: Probing Cumulative State Tracking to Predict LLM Agent Performance

Dengzhe Hou, Lingyu Jiang, Deng Li et al.

Task-completion rate is the standard proxy for LLM agent capability, but models with identical completion scores can differ substantially in their ability to track intermediate state. We introduce Working Memory Fidelity-Active Manipulation (WMF-AM), a calibrated no-scratchpad probe of cumulative arithmetic state tracking, and evaluate it on 20 open-weight models (0.5B-35B, 13 families) against a released deterministic 10-task agent battery. In a pre-specified, Bonferroni-corrected analysis, WMF-AM predicts agent performance with Kendall's tau = 0.612 (p < 0.001, 95% CI [0.360, 0.814]); exploratory partial-tau analyses suggest this signal persists after controlling for completion score and model scale. Three construct-isolation ablations (K = 1 control, non-arithmetic ceiling, yoked cancellation) support the interpretation that cumulative state tracking under load, rather than single-step arithmetic or entity tracking alone, is the primary difficulty source. K-calibration keeps the probe in a discriminative range where prior fixed-depth benchmarks become non-discriminative; generalization beyond this open-weight sample remains open.

MAAug 14, 2024
A Nested Graph Reinforcement Learning-based Decision-making Strategy for Eco-platooning

Xin Gao, Xueyuan Li, Hao Liu et al.

Platooning technology is renowned for its precise vehicle control, traffic flow optimization, and energy efficiency enhancement. However, in large-scale mixed platoons, vehicle heterogeneity and unpredictable traffic conditions lead to virtual bottlenecks. These bottlenecks result in reduced traffic throughput and increased energy consumption within the platoon. To address these challenges, we introduce a decision-making strategy based on nested graph reinforcement learning. This strategy improves collaborative decision-making, ensuring energy efficiency and alleviating congestion. We propose a theory of nested traffic graph representation that maps dynamic interactions between vehicles and platoons in non-Euclidean spaces. By incorporating spatio-temporal weighted graph into a multi-head attention mechanism, we further enhance the model's capacity to process both local and global data. Additionally, we have developed a nested graph reinforcement learning framework to enhance the self-iterative learning capabilities of platooning. Using the I-24 dataset, we designed and conducted comparative algorithm experiments, generalizability testing, and permeability ablation experiments, thereby validating the proposed strategy's effectiveness. Compared to the baseline, our strategy increases throughput by 10% and decreases energy use by 9%. Specifically, increasing the penetration rate of CAVs significantly enhances traffic throughput, though it also increases energy consumption.

LGAug 27, 2025Code
Escaping Stability-Plasticity Dilemma in Online Continual Learning for Motion Forecasting via Synergetic Memory Rehearsal

Yunlong Lin, Chao Lu, Tongshuai Wu et al.

Deep neural networks (DNN) have achieved remarkable success in motion forecasting. However, most DNN-based methods suffer from catastrophic forgetting and fail to maintain their performance in previously learned scenarios after adapting to new data. Recent continual learning (CL) studies aim to mitigate this phenomenon by enhancing memory stability of DNN, i.e., the ability to retain learned knowledge. Yet, excessive emphasis on the memory stability often impairs learning plasticity, i.e., the capacity of DNN to acquire new information effectively. To address such stability-plasticity dilemma, this study proposes a novel CL method, synergetic memory rehearsal (SyReM), for DNN-based motion forecasting. SyReM maintains a compact memory buffer to represent learned knowledge. To ensure memory stability, it employs an inequality constraint that limits increments in the average loss over the memory buffer. Synergistically, a selective memory rehearsal mechanism is designed to enhance learning plasticity by selecting samples from the memory buffer that are most similar to recently observed data. This selection is based on an online-measured cosine similarity of loss gradients, ensuring targeted memory rehearsal. Since replayed samples originate from learned scenarios, this memory rehearsal mechanism avoids compromising memory stability. We validate SyReM under an online CL paradigm where training samples from diverse scenarios arrive as a one-pass stream. Experiments on 11 naturalistic driving datasets from INTERACTION demonstrate that, compared to non-CL and CL baselines, SyReM significantly mitigates catastrophic forgetting in past scenarios while improving forecasting accuracy in new ones. The implementation is publicly available at https://github.com/BIT-Jack/SyReM.

AIAug 2, 2025Code
H2C: Hippocampal Circuit-inspired Continual Learning for Lifelong Trajectory Prediction in Autonomous Driving

Yunlong Lin, Zirui Li, Guodong Du et al.

Deep learning (DL) has shown state-of-the-art performance in trajectory prediction, which is critical to safe navigation in autonomous driving (AD). However, most DL-based methods suffer from catastrophic forgetting, where adapting to a new distribution may cause significant performance degradation in previously learned ones. Such inability to retain learned knowledge limits their applicability in the real world, where AD systems need to operate across varying scenarios with dynamic distributions. As revealed by neuroscience, the hippocampal circuit plays a crucial role in memory replay, effectively reconstructing learned knowledge based on limited resources. Inspired by this, we propose a hippocampal circuit-inspired continual learning method (H2C) for trajectory prediction across varying scenarios. H2C retains prior knowledge by selectively recalling a small subset of learned samples. First, two complementary strategies are developed to select the subset to represent learned knowledge. Specifically, one strategy maximizes inter-sample diversity to represent the distinctive knowledge, and the other estimates the overall knowledge by equiprobable sampling. Then, H2C updates via a memory replay loss function calculated by these selected samples to retain knowledge while learning new data. Experiments based on various scenarios from the INTERACTION dataset are designed to evaluate H2C. Experimental results show that H2C reduces catastrophic forgetting of DL baselines by 22.71% on average in a task-free manner, without relying on manually informed distributional shifts. The implementation is available at https://github.com/BIT-Jack/H2C-lifelong.

QUANT-PHJan 10, 2024Code
QuantumSEA: In-Time Sparse Exploration for Noise Adaptive Quantum Circuits

Tianlong Chen, Zhenyu Zhang, Hanrui Wang et al.

Parameterized Quantum Circuits (PQC) have obtained increasing popularity thanks to their great potential for near-term Noisy Intermediate-Scale Quantum (NISQ) computers. Achieving quantum advantages usually requires a large number of qubits and quantum circuits with enough capacity. However, limited coherence time and massive quantum noises severely constrain the size of quantum circuits that can be executed reliably on real machines. To address these two pain points, we propose QuantumSEA, an in-time sparse exploration for noise-adaptive quantum circuits, aiming to achieve two key objectives: (1) implicit circuits capacity during training - by dynamically exploring the circuit's sparse connectivity and sticking a fixed small number of quantum gates throughout the training which satisfies the coherence time and enjoy light noises, enabling feasible executions on real quantum devices; (2) noise robustness - by jointly optimizing the topology and parameters of quantum circuits under real device noise models. In each update step of sparsity, we leverage the moving average of historical gradients to grow necessary gates and utilize salience-based pruning to eliminate insignificant gates. Extensive experiments are conducted with 7 Quantum Machine Learning (QML) and Variational Quantum Eigensolver (VQE) benchmarks on 6 simulated or real quantum computers, where QuantumSEA consistently surpasses noise-aware search, human-designed, and randomly generated quantum circuit baselines by a clear performance margin. For example, even in the most challenging on-chip training regime, our method establishes state-of-the-art results with only half the number of quantum gates and ~2x time saving of circuit executions. Codes are available at https://github.com/VITA-Group/QuantumSEA.

QUANT-PHFeb 26, 2022Code
QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning

Hanrui Wang, Zirui Li, Jiaqi Gu et al.

Parameterized Quantum Circuits (PQC) are drawing increasing research interest thanks to its potential to achieve quantum advantages on near-term Noisy Intermediate Scale Quantum (NISQ) hardware. In order to achieve scalable PQC learning, the training process needs to be offloaded to real quantum machines instead of using exponential-cost classical simulators. One common approach to obtain PQC gradients is parameter shift whose cost scales linearly with the number of qubits. We present QOC, the first experimental demonstration of practical on-chip PQC training with parameter shift. Nevertheless, we find that due to the significant quantum errors (noises) on real machines, gradients obtained from naive parameter shift have low fidelity and thus degrading the training accuracy. To this end, we further propose probabilistic gradient pruning to firstly identify gradients with potentially large errors and then remove them. Specifically, small gradients have larger relative errors than large ones, thus having a higher probability to be pruned. We perform extensive experiments with the Quantum Neural Network (QNN) benchmarks on 5 classification tasks using 5 real quantum machines. The results demonstrate that our on-chip training achieves over 90% and 60% accuracy for 2-class and 4-class image classification tasks. The probabilistic gradient pruning brings up to 7% PQC accuracy improvements over no pruning. Overall, we successfully obtain similar on-chip training accuracy compared with noise-free simulation but have much better training scalability. The QOC code is available in the TorchQuantum library.

LGFeb 22, 2022Code
A Comparative Study of Deep Reinforcement Learning-based Transferable Energy Management Strategies for Hybrid Electric Vehicles

Jingyi Xu, Zirui Li, Li Gao et al.

The deep reinforcement learning-based energy management strategies (EMS) have become a promising solution for hybrid electric vehicles (HEVs). When driving cycles are changed, the neural network will be retrained, which is a time-consuming and laborious task. A more efficient way of choosing EMS is to combine deep reinforcement learning (DRL) with transfer learning, which can transfer knowledge of one domain to the other new domain, making the network of the new domain reach convergence values quickly. Different exploration methods of DRL, including adding action space noise and parameter space noise, are compared against each other in the transfer learning process in this work. Results indicate that the network added parameter space noise is more stable and faster convergent than the others. In conclusion, the best exploration method for transferable EMS is to add noise in the parameter space, while the combination of action space noise and parameter space noise generally performs poorly. Our code is available at https://github.com/BIT-XJY/RL-based-Transferable-EMS.git.

ROFeb 22, 2022Code
An Ensemble Learning Framework for Vehicle Trajectory Prediction in Interactive Scenarios

Zirui Li, Yunlong Lin, Cheng Gong et al.

Precisely modeling interactions and accurately predicting trajectories of surrounding vehicles are essential to the decision-making and path-planning of intelligent vehicles. This paper proposes a novel framework based on ensemble learning to improve the performance of trajectory predictions in interactive scenarios. The framework is termed Interactive Ensemble Trajectory Predictor (IETP). IETP assembles interaction-aware trajectory predictors as base learners to build an ensemble learner. Firstly, each base learner in IETP observes historical trajectories of vehicles in the scene. Then each base learner handles interactions between vehicles to predict trajectories. Finally, an ensemble learner is built to predict trajectories by applying two ensemble strategies on the predictions from all base learners. Predictions generated by the ensemble learner are final outputs of IETP. In this study, three experiments using different data are conducted based on the NGSIM dataset. Experimental results show that IETP improves the predicting accuracy and decreases the variance of errors compared to base learners. In addition, IETP exceeds baseline models with 50% of the training data, indicating that IETP is data-efficient. Moreover, the implementation of IETP is publicly available at https://github.com/BIT-Jack/IETP.

ROJan 30, 2022Code
Graph Convolution-Based Deep Reinforcement Learning for Multi-Agent Decision-Making in Mixed Traffic Environments

Qi Liu, Zirui Li, Xueyuan Li et al.

An efficient and reliable multi-agent decision-making system is highly demanded for the safe and efficient operation of connected autonomous vehicles in intelligent transportation systems. Current researches mainly focus on the Deep Reinforcement Learning (DRL) methods. However, utilizing DRL methods in interactive traffic scenarios is hard to represent the mutual effects between different vehicles and model the dynamic traffic environments due to the lack of interactive information in the representation of the environments, which results in low accuracy of cooperative decisions generation. To tackle these difficulties, this research proposes a framework to enable different Graph Reinforcement Learning (GRL) methods for decision-making, and compares their performance in interactive driving scenarios. GRL methods combinate the Graph Neural Network (GNN) and DRL to achieve the better decisions generation in interactive scenarios of autonomous vehicles, where the features of interactive scenarios are extracted by the GNN, and cooperative behaviors are generated by DRL framework. Several GRL approaches are summarized and implemented in the proposed framework. To evaluate the performance of the proposed GRL methods, an interactive driving scenarios on highway with two ramps is constructed, and simulated experiment in the SUMO platform is carried out to evaluate the performance of different GRL approaches. Finally, results are analyzed in multiple perspectives and dimensions to compare the characteristic of different GRL approaches in intelligent transportation scenarios. Results show that the implementation of GNN can well represents the interaction between vehicles, and the combination of GNN and DRL is able to improve the performance of the generation of lane-change behaviors. The source code of our work can be found at https://github.com/Jacklinkk/TorchGRL.

QUANT-PHJul 22, 2021Code
QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits

Hanrui Wang, Yongshan Ding, Jiaqi Gu et al.

Quantum noise is the key challenge in Noisy Intermediate-Scale Quantum (NISQ) computers. Previous work for mitigating noise has primarily focused on gate-level or pulse-level noise-adaptive compilation. However, limited research efforts have explored a higher level of optimization by making the quantum circuits themselves resilient to noise. We propose QuantumNAS, a comprehensive framework for noise-adaptive co-search of the variational circuit and qubit mapping. Variational quantum circuits are a promising approach for constructing QML and quantum simulation. However, finding the best variational circuit and its optimal parameters is challenging due to the large design space and parameter training cost. We propose to decouple the circuit search and parameter training by introducing a novel SuperCircuit. The SuperCircuit is constructed with multiple layers of pre-defined parameterized gates and trained by iteratively sampling and updating the parameter subsets (SubCircuits) of it. It provides an accurate estimation of SubCircuits performance trained from scratch. Then we perform an evolutionary co-search of SubCircuit and its qubit mapping. The SubCircuit performance is estimated with parameters inherited from SuperCircuit and simulated with real device noise models. Finally, we perform iterative gate pruning and finetuning to remove redundant gates. Extensively evaluated with 12 QML and VQE benchmarks on 14 quantum computers, QuantumNAS significantly outperforms baselines. For QML, QuantumNAS is the first to demonstrate over 95% 2-class, 85% 4-class, and 32% 10-class classification accuracy on real QC. It also achieves the lowest eigenvalue for VQE tasks on H2, H2O, LiH, CH4, BeH2 compared with UCCSD. We also open-source TorchQuantum (https://github.com/mit-han-lab/torchquantum) for fast training of parameterized quantum circuits to facilitate future research.

MLSep 1, 2020Code
Stochastic Graph Recurrent Neural Network

Tijin Yan, Hongwei Zhang, Zirui Li et al.

Representation learning over graph structure data has been widely studied due to its wide application prospects. However, previous methods mainly focus on static graphs while many real-world graphs evolve over time. Modeling such evolution is important for predicting properties of unseen networks. To resolve this challenge, we propose SGRNN, a novel neural architecture that applies stochastic latent variables to simultaneously capture the evolution in node attributes and topology. Specifically, deterministic states are separated from stochastic states in the iterative process to suppress mutual interference. With semi-implicit variational inference integrated to SGRNN, a non-Gaussian variational distribution is proposed to help further improve the performance. In addition, to alleviate KL-vanishing problem in SGRNN, a simple and interpretable structure is proposed based on the lower bound of KL-divergence. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed model. Code is available at https://github.com/StochasticGRNN/SGRNN.

CLMay 29, 2019Code
Choosing Transfer Languages for Cross-Lingual Learning

Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee et al.

Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages. However, given a particular task language, it is not clear which language to transfer from, and the standard strategy is to select languages based on ad hoc criteria, usually the intuition of the experimenter. Since a large number of features contribute to the success of cross-lingual transfer (including phylogenetic similarity, typological properties, lexical overlap, or size of available data), even the most enlightened experimenter rarely considers all these factors for the particular task at hand. In this paper, we consider this task of automatically selecting optimal transfer languages as a ranking problem, and build models that consider the aforementioned features to perform this prediction. In experiments on representative NLP tasks, we demonstrate that our model predicts good transfer languages much better than ad hoc baselines considering single features in isolation, and glean insights on what features are most informative for each different NLP tasks, which may inform future ad hoc selection even without use of our method. Code, data, and pre-trained models are available at https://github.com/neulab/langrank

LGMay 8
Same Brain, Different Prediction: How Preprocessing Choices Undermine EEG Decoding Reliability

Dengzhe Hou, Zihao Wu, Lingyu Jiang et al.

Electroencephalography (EEG) is a cornerstone of brain-computer interfaces and clinical neuroscience, yet deep learning models are typically trained and evaluated under a single, unreported preprocessing pipeline. We formalize preprocessing choices as a counterfactual intervention space and show that EEG predictions are surprisingly unstable under this space: across six datasets spanning four paradigms, up to 42% of trial-level predictions flip when only the preprocessing changes, a variability that standard uncertainty methods do not explicitly quantify because they condition on a fixed preprocessing pipeline. We provide three tools to make this instability measurable, decomposable, and reducible. First, a Walsh-Hadamard decomposition of the 2^7 pipeline space reveals that sensitivity is near-additive in practice under the binary intervention design, enabling efficient step-by-step optimization. Second, we introduce Preprocessing Uncertainty (PU), a per-trial diagnostic that captures a dimension of instability complementary to model-based confidence. Third, we study Normalized Adaptive PGI (NA-PGI), a graph-structured regularizer that exploits the compositional structure of preprocessing interventions as one mitigation strategy with clear scope conditions.

ROMay 6
Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout

Haozhuang Chi, Daosheng Qiu, Hao Su et al.

Safe L2/L3 driving automation requires anticipating human-in-the-loop reactions during shared-control transitions. While most driving world models forecast the external environment, in-cabin intelligence remains strictly recognition-oriented and lacks multi-step rollout capabilities for driver dynamics. We introduce Driver-WM, a driver-centric latent world model that rolls out in-cabin dynamics causally conditioned on out-cabin traffic context. This formulation unifies physical kinematics forecasting with auxiliary behavioral and emotional semantic recognition. Operating in a compact latent space constructed from frozen vision-language features, Driver-WM adopts a dual-stream architecture to separately encode external traffic and internal driver states. These streams are directionally coupled via a gated causal injection mechanism, which uses a learned vector gate to modulate external contextual perturbations while strictly enforcing temporal causality. Evaluations on a multi-task assistive driving benchmark demonstrate that Driver-WM yields robust long-horizon geometric forecasting for reactive high-motion maneuvers and improves semantic alignment for both driver and traffic states. Finally, the explicit external-to-internal conditioning allows for controlled test-time interventions to systematically analyze mechanism responses.

CVMar 22
Single-Eye View: Monocular Real-time Perception Package for Autonomous Driving

Haixi Zhang, Aiyinsi Zuo, Zirui Li et al.

Amidst the rapid advancement of camera-based autonomous driving technology, effectiveness is often prioritized with limited attention to computational efficiency. To address this issue, this paper introduces LRHPerception, a real-time monocular perception package for autonomous driving that uses single-view camera video to interpret the surrounding environment. The proposed system combines the computational efficiency of end-to-end learning with the rich representational detail of local mapping methodologies. With significant improvements in object tracking and prediction, road segmentation, and depth estimation integrated into a unified framework, LRHPerception processes monocular image data into a five-channel tensor consisting of RGB, road segmentation, and pixel-level depth estimation, augmented with object detection and trajectory prediction. Experimental results demonstrate strong performance, achieving real-time processing at 29 FPS on a single GPU, representing a 555% speedup over the fastest mapping-based approach.

CVApr 7
Physics-Aware Video Instance Removal Benchmark

Zirui Li, Xinghao Chen, Lingyu Jiang et al.

Video Instance Removal (VIR) requires removing target objects while maintaining background integrity and physical consistency, such as specular reflections and illumination interactions. Despite advancements in text-guided editing, current benchmarks primarily assess visual plausibility, often overlooking the physical causalities, such as lingering shadows, triggered by object removal. We introduce the Physics-Aware Video Instance Removal (PVIR) benchmark, featuring 95 high-quality videos annotated with instance-accurate masks and removal prompts. PVIR is partitioned into Simple and Hard subsets, the latter explicitly targeting complex physical interactions. We evaluate four representative methods, PISCO-Removal, UniVideo, DiffuEraser, and CoCoCo, using a decoupled human evaluation protocol across three dimensions to isolate semantic, visual, and spatial failures: instruction following, rendering quality, and edit exclusivity. Our results show that PISCO-Removal and UniVideo achieve state-of-the-art performance, while DiffuEraser frequently introduces blurring artifacts and CoCoCo struggles significantly with instruction following. The persistent performance drop on the Hard subset highlights the ongoing challenge of recovering complex physical side effects.

LGOct 17, 2024
ArrivalNet: Predicting City-wide Bus/Tram Arrival Time with Two-dimensional Temporal Variation Modeling

Zirui Li, Patrick Wolf, Meng Wang

Accurate arrival time prediction (ATP) of buses and trams plays a crucial role in public transport operations. Current methods focused on modeling one-dimensional temporal information but overlooked the latent periodic information within time series. Moreover, most studies developed algorithms for ATP based on a single or a few routes of public transport, which reduces the transferability of the prediction models and their applicability in public transport management systems. To this end, this paper proposes \textit{ArrivalNet}, a two-dimensional temporal variation-based multi-step ATP for buses and trams. It decomposes the one-dimensional temporal sequence into intra-periodic and inter-periodic variations, which can be recast into two-dimensional tensors (2D blocks). Each row of a tensor contains the time points within a period, and each column involves the time points at the same intra-periodic index across various periods. The transformed 2D blocks in different frequencies have an image-like feature representation that enables effective learning with computer vision backbones (e.g., convolutional neural network). Drawing on the concept of residual neural network, the 2D block module is designed as a basic module for flexible aggregation. Meanwhile, contextual factors like workdays, peak hours, and intersections, are also utilized in the augmented feature representation to improve the performance of prediction. 125 days of public transport data from Dresden were collected for model training and validation. Experimental results show that the root mean square error, mean absolute error, and mean absolute percentage error of the proposed predictor decrease by at least 6.1\%, 14.7\%, and 34.2\% compared with state-of-the-art baseline methods.

LGDec 22, 2023
Spatiotemporal-Linear: Towards Universal Multivariate Time Series Forecasting

Aiyinsi Zuo, Haixi Zhang, Zirui Li et al.

Within the field of complicated multivariate time series forecasting (TSF), popular techniques frequently rely on intricate deep learning architectures, ranging from transformer-based designs to recurrent neural networks. However, recent findings suggest that simple Linear models can surpass sophisticated constructs on diverse datasets. These models directly map observation to multiple future time steps, thereby minimizing error accumulation in iterative multi-step prediction. Yet, these models fail to incorporate spatial and temporal information within the data, which is critical for capturing patterns and dependencies that drive insightful predictions. This oversight often leads to performance bottlenecks, especially under specific sequence lengths and dataset conditions, preventing their universal application. In response, we introduce the SpatioTemporal-Linear (STL) framework. STL seamlessly integrates time-embedded and spatially-informed bypasses to augment the Linear-based architecture. These extra routes offer a more robust and refined regression to the data, particularly when the amount of observation is limited and the capacity of simple linear layers to capture dependencies declines. Empirical evidence highlights STL's prowess, outpacing both Linear and Transformer benchmarks across varied observation and prediction durations and datasets. Such robustness accentuates its suitability across a spectrum of applications, including but not limited to, traffic trajectory and rare disease progression forecasting. Through this discourse, we not only validate the STL's distinctive capacities to become a more general paradigm in multivariate time-series prediction using deep-learning techniques but also stress the need to tackle data-scarce prediction scenarios for universal application. Code will be made available.

CLJan 4
JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese Large Language Models

Junyu Liu, Zirui Li, Qian Niu et al.

As Large Language Models (LLMs) are increasingly deployed in healthcare field, it becomes essential to carefully evaluate their medical safety before clinical use. However, existing safety benchmarks remain predominantly English-centric, and test with only single-turn prompts despite multi-turn clinical consultations. To address these gaps, we introduce JMedEthicBench, the first multi-turn conversational benchmark for evaluating medical safety of LLMs for Japanese healthcare. Our benchmark is based on 67 guidelines from the Japan Medical Association and contains over 50,000 adversarial conversations generated using seven automatically discovered jailbreak strategies. Using a dual-LLM scoring protocol, we evaluate 27 models and find that commercial models maintain robust safety while medical-specialized models exhibit increased vulnerability. Furthermore, safety scores decline significantly across conversation turns (median: 9.5 to 5.0, $p < 0.001$). Cross-lingual evaluation on both Japanese and English versions of our benchmark reveals that medical model vulnerabilities persist across languages, indicating inherent alignment limitations rather than language-specific factors. These findings suggest that domain-specific fine-tuning may accidentally weaken safety mechanisms and that multi-turn interactions represent a distinct threat surface requiring dedicated alignment strategies.

AISep 30, 2025
Cooperative Autonomous Driving in Diverse Behavioral Traffic: A Heterogeneous Graph Reinforcement Learning Approach

Qi Liu, Xueyuan Li, Zirui Li et al.

Navigating heterogeneous traffic environments with diverse driving styles poses a significant challenge for autonomous vehicles (AVs) due to their inherent complexity and dynamic interactions. This paper addresses this challenge by proposing a heterogeneous graph reinforcement learning (GRL) framework enhanced with an expert system to improve AV decision-making performance. Initially, a heterogeneous graph representation is introduced to capture the intricate interactions among vehicles. Then, a heterogeneous graph neural network with an expert model (HGNN-EM) is proposed to effectively encode diverse vehicle features and produce driving instructions informed by domain-specific knowledge. Moreover, the double deep Q-learning (DDQN) algorithm is utilized to train the decision-making model. A case study on a typical four-way intersection, involving various driving styles of human vehicles (HVs), demonstrates that the proposed method has superior performance over several baselines regarding safety, efficiency, stability, and convergence rate, all while maintaining favorable real-time performance.

LGAug 31, 2025
Exploring Over-stationarization in Deep Learning-based Bus/Tram Arrival Time Prediction: Analysis and Non-stationary Effect Recovery

Zirui Li, Bin Yang, Meng Wang

Arrival time prediction (ATP) of public transport vehicles is essential in improving passenger experience and supporting traffic management. Deep learning has demonstrated outstanding performance in ATP due to its ability to model non-linear and temporal dynamics. In the multi-step ATP, non-stationary data will degrade the model performance due to the variation in variables' joint distribution along the temporal direction. Previous studies mainly applied normalization to eliminate the non-stationarity in time series, thereby achieving better predictability. However, the normalization may obscure useful characteristics inherent in non-stationarity, which is known as the over-stationarization. In this work, to trade off predictability and non-stationarity, a new approach for multi-step ATP, named non-stationary ATP ( NSATP), is proposed. The method consists of two stages: series stationarization and non-stationarity effect recovery. The first stage aims at improving the predictability. As for the latter, NSATP extends a state-of-the-art method from one-dimensional to two dimensional based models to capture the hidden periodicity in time series and designs a compensation module of over-stationarization by learning scaling and shifting factors from raw data. 125 days' public transport operational data of Dresden is collected for validation. Experimental results show that compared to baseline methods, the proposed NSATP can reduce RMSE, MAE, and MAPE by 2.37%, 1.22%, and 2.26% for trams and by 1.72%, 0.60%, and 1.17% for buses, respectively.

LGAug 27, 2025
Complementary Learning System Empowers Online Continual Learning of Vehicle Motion Forecasting in Smart Cities

Zirui Li, Yunlong Lin, Guodong Du et al.

Artificial intelligence underpins most smart city services, yet deep neural network (DNN) that forecasts vehicle motion still struggle with catastrophic forgetting, the loss of earlier knowledge when models are updated. Conventional fixes enlarge the training set or replay past data, but these strategies incur high data collection costs, sample inefficiently and fail to balance long- and short-term experience, leaving them short of human-like continual learning. Here we introduce Dual-LS, a task-free, online continual learning paradigm for DNN-based motion forecasting that is inspired by the complementary learning system of the human brain. Dual-LS pairs two synergistic memory rehearsal replay mechanisms to accelerate experience retrieval while dynamically coordinating long-term and short-term knowledge representations. Tests on naturalistic data spanning three countries, over 772,000 vehicles and cumulative testing mileage of 11,187 km show that Dual-LS mitigates catastrophic forgetting by up to 74.31\% and reduces computational resource demand by up to 94.02\%, markedly boosting predictive stability in vehicle motion forecasting without inflating data requirements. Meanwhile, it endows DNN-based vehicle motion forecasting with computation efficient and human-like continual learning adaptability fit for smart cities.

SEAug 22, 2025
LLM-Assisted Semantic Alignment and Integration in Collaborative Model-Based Systems Engineering Using SysML v2

Zirui Li, Stephan Husung, Haoze Wang

Cross-organizational collaboration in Model-Based Systems Engineering (MBSE) faces many challenges in achieving semantic alignment across independently developed system models. SysML v2 introduces enhanced structural modularity and formal semantics, offering a stronger foundation for interoperable modeling. Meanwhile, GPT-based Large Language Models (LLMs) provide new capabilities for assisting model understanding and integration. This paper proposes a structured, prompt-driven approach for LLM-assisted semantic alignment of SysML v2 models. The core contribution lies in the iterative development of an alignment approach and interaction prompts, incorporating model extraction, semantic matching, and verification. The approach leverages SysML v2 constructs such as alias, import, and metadata extensions to support traceable, soft alignment integration. It is demonstrated with a GPT-based LLM through an example of a measurement system. Benefits and limitations are discussed.

LGJul 21, 2025
Red-Team Multi-Agent Reinforcement Learning for Emergency Braking Scenario

Yinsong Chen, Kaifeng Wang, Xiaoqiang Meng et al.

Current research on decision-making in safety-critical scenarios often relies on inefficient data-driven scenario generation or specific modeling approaches, which fail to capture corner cases in real-world contexts. To address this issue, we propose a Red-Team Multi-Agent Reinforcement Learning framework, where background vehicles with interference capabilities are treated as red-team agents. Through active interference and exploration, red-team vehicles can uncover corner cases outside the data distribution. The framework uses a Constraint Graph Representation Markov Decision Process, ensuring that red-team vehicles comply with safety rules while continuously disrupting the autonomous vehicles (AVs). A policy threat zone model is constructed to quantify the threat posed by red-team vehicles to AVs, inducing more extreme actions to increase the danger level of the scenario. Experimental results show that the proposed framework significantly impacts AVs decision-making safety and generates various corner cases. This method also offers a novel direction for research in safety-critical scenarios.

LGOct 21, 2021
QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization

Hanrui Wang, Jiaqi Gu, Yongshan Ding et al.

Parameterized Quantum Circuits (PQC) are promising towards quantum advantage on near-term quantum hardware. However, due to the large quantum noises (errors), the performance of PQC models has a severe degradation on real quantum devices. Take Quantum Neural Network (QNN) as an example, the accuracy gap between noise-free simulation and noisy results on IBMQ-Yorktown for MNIST-4 classification is over 60%. Existing noise mitigation methods are general ones without leveraging unique characteristics of PQC; on the other hand, existing PQC work does not consider noise effect. To this end, we present QuantumNAT, a PQC-specific framework to perform noise-aware optimizations in both training and inference stages to improve robustness. We experimentally observe that the effect of quantum noise to PQC measurement outcome is a linear map from noise-free outcome with a scaling and a shift factor. Motivated by that, we propose post-measurement normalization to mitigate the feature distribution differences between noise-free and noisy scenarios. Furthermore, to improve the robustness against noise, we propose noise injection to the training process by inserting quantum error gates to PQC according to realistic noise models of quantum hardware. Finally, post-measurement quantization is introduced to quantize the measurement outcomes to discrete values, achieving the denoising effect. Extensive experiments on 8 classification tasks using 6 quantum devices demonstrate that QuantumNAT improves accuracy by up to 43%, and achieves over 94% 2-class, 80% 4-class, and 34% 10-class classification accuracy measured on real quantum computers. The code for construction and noise-aware training of PQC is available in the TorchQuantum library.

ROSep 15, 2021
Sequential Point Cloud Prediction in Interactive Scenarios: A Survey

Haowen Wang, Zirui Li, Jianwei Gong

Point cloud has been widely used in the field of autonomous driving since it can provide a more comprehensive three-dimensional representation of the environment than 2D images. Point-wise prediction based on point cloud sequence (PCS) is an essential part of environment understanding, which can assist in the decision-making and motion-planning of autonomous vehicles. However, PCS prediction has not been deeply researched in the literature. This paper proposes a brief review of the sequential point cloud prediction methods, focusing on interactive scenarios. Firstly, we define the PCS prediction problem and introduce commonly-used frameworks. Secondly, by reviewing non-predictive problems, we analyze and summarize the spatio-temporal feature extraction methods based on PCS. On this basis, we review two types of PCS prediction tasks, scene flow estimation (SFE) and point cloud location prediction (PCLP), highlighting their connections and differences. Finally, we discuss some opening issues and point out some potential research directions.

ROSep 15, 2021
Life-Long Multi-Task Learning of Adaptive Path Tracking Policy for Autonomous Vehicle

Cheng Gong, Jianwei Gong, Chao Lu et al.

This paper proposes a life-long adaptive path tracking policy learning method for autonomous vehicles that can self-evolve and self-adapt with multi-task knowledge. Firstly, the proposed method can learn a model-free control policy for path tracking directly from the historical driving experience, where the property of vehicle dynamics and corresponding control strategy can be learned simultaneously. Secondly, by utilizing the life-long learning method, the proposed method can learn the policy with task-incremental knowledge without encountering catastrophic forgetting. Thus, with continual multi-task knowledge learned, the policy can iteratively adapt to new tasks and improve its performance with knowledge from new tasks. Thirdly, a memory evaluation and updating method is applied to optimize memory structure for life-long learning which enables the policy to learn toward selected directions. Experiments are conducted using a high-fidelity vehicle dynamic model in a complex curvy road to evaluate the performance of the proposed method. Results show that the proposed method can effectively evolve with continual multi-task knowledge and adapt to the new environment, where the performance of the proposed method can also surpass two commonly used baseline methods after evolving.

CRSep 8, 2021
Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client Selection

Shulai Zhang, Zirui Li, Quan Chen et al.

Federated learning (FL) is a distributed machine learning paradigm that allows clients to collaboratively train a model over their own local data. FL promises the privacy of clients and its security can be strengthened by cryptographic methods such as additively homomorphic encryption (HE). However, the efficiency of FL could seriously suffer from the statistical heterogeneity in both the data distribution discrepancy among clients and the global distribution skewness. We mathematically demonstrate the cause of performance degradation in FL and examine the performance of FL over various datasets. To tackle the statistical heterogeneity problem, we propose a pluggable system-level client selection method named Dubhe, which allows clients to proactively participate in training, meanwhile preserving their privacy with the assistance of HE. Experimental results show that Dubhe is comparable with the optimal greedy method on the classification accuracy, with negligible encryption and communication overhead.

ROAug 2, 2021
Orientation-Aware Planning for Parallel Task Execution of Omni-Directional Mobile Robot

Cheng Gong, Zirui Li, Xingyu Zhou et al.

Omni-directional mobile robot (OMR) systems have been very popular in academia and industry for their superb maneuverability and flexibility. Yet their potential has not been fully exploited, where the extra degree of freedom in OMR can potentially enable the robot to carry out extra tasks. For instance, gimbals or sensors on robots may suffer from a limited field of view or be constrained by the inherent mechanical design, which will require the chassis to be orientation-aware and respond in time. To solve this problem and further develop the OMR systems, in this paper, we categorize the tasks related to OMR chassis into orientation transition tasks and position transition tasks, where the two tasks can be carried out at the same time. By integrating the parallel task goals in a single planning problem, we proposed an orientation-aware planning architecture for OMR systems to execute the orientation transition and position transition in a unified and efficient way. A modified trajectory optimization method called orientation-aware timed-elastic-band (OATEB) is introduced to generate the trajectory that satisfies the requirements of both tasks. Experiments in both 2D simulated environments and real scenes are carried out. A four-wheeled OMR is deployed to conduct the real scene experiment and the results demonstrate that the proposed method is capable of simultaneously executing parallel tasks and is applicable to real-life scenarios.

ROJul 2, 2021
Decision-Making Technology for Autonomous Vehicles Learning-Based Methods, Applications and Future Outlook

Qi Liu, Xueyuan Li, Shihua Yuan et al.

Autonomous vehicles have a great potential in the application of both civil and military fields, and have become the focus of research with the rapid development of science and economy. This article proposes a brief review on learning-based decision-making technology for autonomous vehicles since it is significant for safer and efficient performance of autonomous vehicles. Firstly, the basic outline of decision-making technology is provided. Secondly, related works about learning-based decision-making methods for autonomous vehicles are mainly reviewed with the comparison to classical decision-making methods. In addition, applications of decision-making methods in existing autonomous vehicles are summarized. Finally, promising research topics in the future study of decision-making technology for autonomous vehicles are prospected.

ROJun 24, 2021
Autonomous Driving Strategies at Intersections: Scenarios, State-of-the-Art, and Future Outlooks

Lianzhen Wei, Zirui Li, Jianwei Gong et al.

Due to the complex and dynamic character of intersection scenarios, the autonomous driving strategy at intersections has been a difficult problem and a hot point in the research of intelligent transportation systems in recent years. This paper gives a brief summary of state-of-the-art autonomous driving strategies at intersections. Firstly, we enumerate and analyze common types of intersection scenarios, corresponding simulation platforms, as well as related datasets. Secondly, by reviewing previous studies, we have summarized characteristics of existing autonomous driving strategies and classified them into several categories. Finally, we point out problems of the existing autonomous driving strategies and put forward several valuable research outlooks.

CVSep 17, 2020
High-precision target positioning system for unmanned vehicles based on binocular vision

Xianqi He, Zirui Li, Xufeng Yin et al.

Unmanned vehicles often need to locate targets with high precision during work. In the unmanned material handling workshop, the unmanned vehicle needs to perform high-precision pose estimation of the workpiece to accurately grasp the workpiece. In this context, this paper proposes a high-precision unmanned vehicle target positioning system based on binocular vision. The system uses a region-based stereo matching algorithm to obtain a disparity map, and uses the RANSAC algorithm to extract position and posture features, which achives the estimation of the position and attitude of a six-degree-of-freedom cylindrical workpiece. In order to verify the effect of the system, this paper collects the accuracy and calculation time of the output results of the cylinder in different poses. The experimental data shows that the position accuracy of the system is 0.61~1.17mm and the angular accuracy is 1.95~5.13°, which can achieve better high-precision positioning effect.

ROSep 1, 2020
Autonomous Formula Racecar: Overall System Design and Experimental Validation

Hanqing Tian, Jun Ni, Zirui Li et al.

This paper develops and summarizes the work of building the autonomous integrated system including perception system and vehicle dynamic controller for a formula student autonomous racecar. We propose a system framework combining X-by-wired modification, perception & motion planning and vehicle dynamic control as a template of FSAC racecar which can be easily replicated. A LIDAR-vision cooperating method of detecting traffic cone which is used as track mark is proposed. Detection algorithm of the racecar also implements a precise and high rate localization method which combines the GPS-INS data and LIDAR odometry. Besides, a track map including the location and color information of the cones is built simultaneously. Finally, the system and vehicle performance on a closed loop track is tested. This paper also briefly introduces the Formula Student Autonomous Competition (FSAC).

CVJul 11, 2020
Driver Behavior Modelling at the Urban Intersection via Canonical Correlation Analysis

Zirui Li, Chao Lu, Cheng Gong et al.

The urban intersection is a typically dynamic and complex scenario for intelligent vehicles, which exists a variety of driving behaviors and traffic participants. Accurately modelling the driver behavior at the intersection is essential for intelligent transportation systems (ITS). Previous researches mainly focus on using attention mechanism to model the degree of correlation. In this research, a canonical correlation analysis (CCA)-based framework is proposed. The value of canonical correlation is used for feature selection. Gaussian mixture model and Gaussian process regression are applied for driver behavior modelling. Two experiments using simulated and naturalistic driving data are designed for verification. Experimental results are consistent with the driver's judgment. Comparative studies show that the proposed framework can obtain a better performance.

CVJul 4, 2020
A Survey on Sensor Technologies for Unmanned Ground Vehicles

Qi Liu, Shihua Yuan, Zirui Li

Unmanned ground vehicles have a huge development potential in both civilian and military fields, and have become the focus of research in various countries. In addition, high-precision, high-reliability sensors are significant for UGVs' efficient operation. This paper proposes a brief review on sensor technologies for UGVs. Firstly, characteristics of various sensors are introduced. Then the strengths and weaknesses of different sensors as well as their application scenarios are compared. Furthermore, sensor applications in some existing UGVs are summarized. Finally, the hotspots of sensor technologies are forecasted to point the development direction.

ROAug 31, 2019
From perception to control: an autonomous driving system for a formula student driverless car

Tairan Chen, Zirui Li, Yiting He et al.

This paper introduces the autonomous system of the "Smart Shark II" which won the Formula Student Autonomous China (FSAC) Competition in 2018. In this competition, an autonomous racecar is required to complete autonomously two laps of unknown track. In this paper, the author presents the self-driving software structure of this racecar which ensure high vehicle speed and safety. The key components ensure a stable driving of the racecar, LiDAR-based and Vision-based cone detection provide a redundant perception; the EKF-based localization offers high accuracy and high frequency state estimation; perception results are accumulated in time and space by occupancy grid map. After getting the trajectory, a model predictive control algorithm is used to optimize in both longitudinal and lateral control of the racecar. Finally, the performance of an experiment based on real-world data is shown.

CLDec 13, 2018
Towards a General-Purpose Linguistic Annotation Backend

Graham Neubig, Patrick Littell, Chian-Yu Chen et al.

Language documentation is inherently a time-intensive process; transcription, glossing, and corpus management consume a significant portion of documentary linguists' work. Advances in natural language processing can help to accelerate this work, using the linguists' past decisions as training material, but questions remain about how to prioritize human involvement. In this extended abstract, we describe the beginnings of a new project that will attempt to ease this language documentation process through the use of natural language processing (NLP) technology. It is based on (1) methods to adapt NLP tools to new languages, based on recent advances in massively multilingual neural networks, and (2) backend APIs and interfaces that allow linguists to upload their data. We then describe our current progress on two fronts: automatic phoneme transcription, and glossing. Finally, we briefly describe our future directions.

LGJun 2, 2018
Learning and Generalizing Motion Primitives from Driving Data for Path-Tracking Applications

Boyang Wang, Zirui Li, Jianwei Gong et al.

Considering the driving habits which are learned from the naturalistic driving data in the path-tracking system can significantly improve the acceptance of intelligent vehicles. Therefore, the goal of this paper is to generate the prediction results of lateral commands with confidence regions according to the reference based on the learned motion primitives. We present a two-level structure for learning and generalizing motion primitives through demonstrations. The lower-level motion primitives are generated under the path segmentation and clustering layer in the upper-level. The Gaussian Mixture Model(GMM) is utilized to represent the primitives and Gaussian Mixture Regression (GMR) is selected to generalize the motion primitives. We show how the upper-level can help to improve the prediction accuracy and evaluate the influence of different time scales and the number of Gaussian components. The model is trained and validated by using the driving data collected from the Beijing Institute of Technology (BIT) intelligent vehicle platform. Experiment results show that the proposed method can extract the motion primitives from the driving data and predict the future lateral control commands with high accuracy.