Xiaoxi Hu

CV
h-index16
5papers
17citations
Novelty36%
AI Score37

5 Papers

CVApr 17, 2025Code
Collaborative Perception Datasets for Autonomous Driving: A Review

Naibang Wang, Deyong Shang, Yan Gong et al.

Collaborative perception has attracted growing interest from academia and industry due to its potential to enhance perception accuracy, safety, and robustness in autonomous driving through multi-agent information fusion. With the advancement of Vehicle-to-Everything (V2X) communication, numerous collaborative perception datasets have emerged, varying in cooperation paradigms, sensor configurations, data sources, and application scenarios. However, the absence of systematic summarization and comparative analysis hinders effective resource utilization and standardization of model evaluation. As the first comprehensive review focused on collaborative perception datasets, this work reviews and compares existing resources from a multi-dimensional perspective. We categorize datasets based on cooperation paradigms, examine their data sources and scenarios, and analyze sensor modalities and supported tasks. A detailed comparative analysis is conducted across multiple dimensions. We also outline key challenges and future directions, including dataset scalability, diversity, domain adaptation, standardization, privacy, and the integration of large language models. To support ongoing research, we provide a continuously updated online repository of collaborative perception datasets and related literature: https://github.com/frankwnb/Collaborative-Perception-Datasets-for-Autonomous-Driving.

QMMar 13, 2023
CoGANPPIS: A Coevolution-enhanced Global Attention Neural Network for Protein-Protein Interaction Site Prediction

Jiaxing Guo, Xuening Zhu, Zixin Hu et al.

Protein-protein interactions are of great importance in biochemical processes. Accurate prediction of protein-protein interaction sites (PPIs) is crucial for our understanding of biological mechanism. Although numerous approaches have been developed recently and achieved gratifying results, there are still two limitations: (1) Most existing models have excavated a number of useful input features, but failed to take coevolutionary features into account, which could provide clues for inter-residue relationships; (2) The attention-based models only allocate attention weights for neighboring residues, instead of doing it globally, which may limit the model's prediction performance since some residues being far away from the target residues might also matter. We propose a coevolution-enhanced global attention neural network, a sequence-based deep learning model for PPIs prediction, called CoGANPPIS. Specifically, CoGANPPIS utilizes three layers in parallel for feature extraction: (1) Local-level representation aggregation layer, which aggregates the neighboring residues' features as the local feature representation; (2) Global-level representation learning layer, which employs a novel coevolution-enhanced global attention mechanism to allocate attention weights to all residues on the same protein sequences; (3) Coevolutionary information learning layer, which applies CNN & pooling to coevolutionary information to obtain the coevolutionary profile representation. Then, the three outputs are concatenated and passed into several fully connected layers for the final prediction. Extensive experiments on two benchmark datasets have been conducted, demonstrating that our proposed model achieves the state-of-the-art performance.

ROAug 11, 2025
Progressive Bird's Eye View Perception for Safety-Critical Autonomous Driving: A Comprehensive Survey

Yan Gong, Naibang Wang, Jianli Lu et al.

Bird's-Eye-View (BEV) perception has become a foundational paradigm in autonomous driving, enabling unified spatial representations that support robust multi-sensor fusion and multi-agent collaboration. As autonomous vehicles transition from controlled environments to real-world deployment, ensuring the safety and reliability of BEV perception in complex scenarios - such as occlusions, adverse weather, and dynamic traffic - remains a critical challenge. This survey provides the first comprehensive review of BEV perception from a safety-critical perspective, systematically analyzing state-of-the-art frameworks and implementation strategies across three progressive stages: single-modality vehicle-side, multimodal vehicle-side, and multi-agent collaborative perception. Furthermore, we examine public datasets encompassing vehicle-side, roadside, and collaborative settings, evaluating their relevance to safety and robustness. We also identify key open-world challenges - including open-set recognition, large-scale unlabeled data, sensor degradation, and inter-agent communication latency - and outline future research directions, such as integration with end-to-end autonomous driving systems, embodied intelligence, and large language models.

CVSep 16, 2025
Recurrent Cross-View Object Geo-Localization

Xiaohan Zhang, Si-Yuan Cao, Xiaokai Bai et al.

Cross-view object geo-localization (CVOGL) aims to determine the location of a specific object in high-resolution satellite imagery given a query image with a point prompt. Existing approaches treat CVOGL as a one-shot detection task, directly regressing object locations from cross-view information aggregation, but they are vulnerable to feature noise and lack mechanisms for error correction. In this paper, we propose ReCOT, a Recurrent Cross-view Object geo-localization Transformer, which reformulates CVOGL as a recurrent localization task. ReCOT introduces a set of learnable tokens that encode task-specific intent from the query image and prompt embeddings, and iteratively attend to the reference features to refine the predicted location. To enhance this recurrent process, we incorporate two complementary modules: (1) a SAM-based knowledge distillation strategy that transfers segmentation priors from the Segment Anything Model (SAM) to provide clearer semantic guidance without additional inference cost, and (2) a Reference Feature Enhancement Module (RFEM) that introduces a hierarchical attention to emphasize object-relevant regions in the reference features. Extensive experiments on standard CVOGL benchmarks demonstrate that ReCOT achieves state-of-the-art (SOTA) performance while reducing parameters by 60% compared to previous SOTA approaches.

SPJun 1, 2025
LD-RPMNet: Near-Sensor Diagnosis for Railway Point Machines

Wei Li, Xiaochun Wu, Xiaoxi Hu et al.

Near-sensor diagnosis has become increasingly prevalent in industry. This study proposes a lightweight model named LD-RPMNet that integrates Transformers and Convolutional Neural Networks, leveraging both local and global feature extraction to optimize computational efficiency for a practical railway application. The LD-RPMNet introduces a Multi-scale Depthwise Separable Convolution (MDSC) module, which decomposes cross-channel convolutions into pointwise and depthwise convolutions while employing multi-scale kernels to enhance feature extraction. Meanwhile, a Broadcast Self-Attention (BSA) mechanism is incorporated to simplify complex matrix multiplications and improve computational efficiency. Experimental results based on collected sound signals during the operation of railway point machines demonstrate that the optimized model reduces parameter count and computational complexity by 50% while improving diagnostic accuracy by nearly 3%, ultimately achieving an accuracy of 98.86%. This demonstrates the possibility of near-sensor fault diagnosis applications in railway point machines.