Weichao Pan

CV
h-index3
7papers
38citations
Novelty48%
AI Score34

7 Papers

CVSep 3, 2024
DAPONet: A Dual Attention and Partially Overparameterized Network for Real-Time Road Damage Detection

Weichao Pan, Jiaju Kang, Xu Wang et al.

Current road damage detection methods, relying on manual inspections or sensor-mounted vehicles, are inefficient, limited in coverage, and often inaccurate, especially for minor damages, leading to delays and safety hazards. To address these issues and enhance real-time road damage detection using street view image data (SVRDD), we propose DAPONet, a model incorporating three key modules: a dual attention mechanism combining global and local attention, a multi-scale partial over-parameterization module, and an efficient downsampling module. DAPONet achieves a mAP50 of 70.1% on the SVRDD dataset, outperforming YOLOv10n by 10.4%, while reducing parameters to 1.6M and FLOPs to 1.7G, representing reductions of 41% and 80%, respectively. On the MS COCO2017 val dataset, DAPONet achieves an mAP50-95 of 33.4%, 0.8% higher than EfficientDet-D1, with a 74% reduction in both parameters and FLOPs.

CVSep 4, 2024
Real-Time Dynamic Scale-Aware Fusion Detection Network: Take Road Damage Detection as an example

Weichao Pan, Xu Wang, Wenqing Huan

Unmanned Aerial Vehicle (UAV)-based Road Damage Detection (RDD) is important for daily maintenance and safety in cities, especially in terms of significantly reducing labor costs. However, current UAV-based RDD research is still faces many challenges. For example, the damage with irregular size and direction, the masking of damage by the background, and the difficulty of distinguishing damage from the background significantly affect the ability of UAV to detect road damage in daily inspection. To solve these problems and improve the performance of UAV in real-time road damage detection, we design and propose three corresponding modules: a feature extraction module that flexibly adapts to shape and background; a module that fuses multiscale perception and adapts to shape and background ; an efficient downsampling module. Based on these modules, we designed a multi-scale, adaptive road damage detection model with the ability to automatically remove background interference, called Dynamic Scale-Aware Fusion Detection Model (RT-DSAFDet). Experimental results on the UAV-PDD2023 public dataset show that our model RT-DSAFDet achieves a mAP50 of 54.2%, which is 11.1% higher than that of YOLOv10-m, an efficient variant of the latest real-time object detection model YOLOv10, while the amount of parameters is reduced to 1.8M and FLOPs to 4.6G, with a decreased by 88% and 93%, respectively. Furthermore, on the large generalized object detection public dataset MS COCO2017 also shows the superiority of our model with mAP50-95 is the same as YOLOv9-t, but with 0.5% higher mAP50, 10% less parameters volume, and 40% less FLOPs.

CVSep 19, 2024
EFA-YOLO: An Efficient Feature Attention Model for Fire and Flame Detection

Weichao Pan, Xu Wang, Wenqing Huan

As a natural disaster with high suddenness and great destructiveness, fire has long posed a major threat to human society and ecological environment. In recent years, with the rapid development of smart city and Internet of Things (IoT) technologies, fire detection systems based on deep learning have gradually become a key means to cope with fire hazards. However, existing fire detection models still have many challenges in terms of detection accuracy and real-time performance in complex contexts. To address these issues, we propose two key modules: EAConv (Efficient Attention Convolution) and EADown (Efficient Attention Downsampling). The EAConv module significantly improves the feature extraction efficiency by combining an efficient attention mechanism with depth-separable convolution, while the EADown module enhances the accuracy and efficiency of feature downsampling by utilizing spatial and channel attention mechanisms in combination with pooling operations. Based on these two modules, we design an efficient and lightweight flame detection model, EFA-YOLO (Efficient Feature Attention YOLO). Experimental results show that EFA-YOLO has a model parameter quantity of only 1.4M, GFLOPs of 4.6, and the inference time per image on the CPU is only 22.19 ms. Compared with existing mainstream models (e.g., YOLOv5, YOLOv8, YOLOv9, and YOLOv10), EFA-YOLO exhibits a significant enhancement in detection accuracy (mAP) and inference speed, with model parameter amount is reduced by 94.6 and the inference speed is improved by 88 times.

CVMay 27, 2025Code
YOLO-FireAD: Efficient Fire Detection via Attention-Guided Inverted Residual Learning and Dual-Pooling Feature Preservation

Weichao Pan, Bohan Xu, Xu Wang et al.

Fire detection in dynamic environments faces continuous challenges, including the interference of illumination changes, many false detections or missed detections, and it is difficult to achieve both efficiency and accuracy. To address the problem of feature extraction limitation and information loss in the existing YOLO-based models, this study propose You Only Look Once for Fire Detection with Attention-guided Inverted Residual and Dual-pooling Downscale Fusion (YOLO-FireAD) with two core innovations: (1) Attention-guided Inverted Residual Block (AIR) integrates hybrid channel-spatial attention with inverted residuals to adaptively enhance fire features and suppress environmental noise; (2) Dual Pool Downscale Fusion Block (DPDF) preserves multi-scale fire patterns through learnable fusion of max-average pooling outputs, mitigating small-fire detection failures. Extensive evaluation on two public datasets shows the efficient performance of our model. Our proposed model keeps the sum amount of parameters (1.45M, 51.8% lower than YOLOv8n) (4.6G, 43.2% lower than YOLOv8n), and mAP75 is higher than the mainstream real-time object detection models YOLOv8n, YOL-Ov9t, YOLOv10n, YOLO11n, YOLOv12n and other YOLOv8 variants 1.3-5.5%. For more details, please visit our repository: https://github.com/JEFfersusu/YOLO-FireAD

CVJul 31, 2025
YOLO-ROC: A High-Precision and Ultra-Lightweight Model for Real-Time Road Damage Detection

Zicheng Lin, Weichao Pan

Road damage detection is a critical task for ensuring traffic safety and maintaining infrastructure integrity. While deep learning-based detection methods are now widely adopted, they still face two core challenges: first, the inadequate multi-scale feature extraction capabilities of existing networks for diverse targets like cracks and potholes, leading to high miss rates for small-scale damage; and second, the substantial parameter counts and computational demands of mainstream models, which hinder their deployment for efficient, real-time detection in practical applications. To address these issues, this paper proposes a high-precision and lightweight model, YOLO - Road Orthogonal Compact (YOLO-ROC). We designed a Bidirectional Multi-scale Spatial Pyramid Pooling Fast (BMS-SPPF) module to enhance multi-scale feature extraction and implemented a hierarchical channel compression strategy to reduce computational complexity. The BMS-SPPF module leverages a bidirectional spatial-channel attention mechanism to improve the detection of small targets. Concurrently, the channel compression strategy reduces the parameter count from 3.01M to 0.89M and GFLOPs from 8.1 to 2.6. Experiments on the RDD2022_China_Drone dataset demonstrate that YOLO-ROC achieves a mAP50 of 67.6%, surpassing the baseline YOLOv8n by 2.11%. Notably, the mAP50 for the small-target D40 category improved by 16.8%, and the final model size is only 2.0 MB. Furthermore, the model exhibits excellent generalization performance on the RDD2022_China_Motorbike dataset.

LGFeb 4, 2025
RIE-SenseNet: Riemannian Manifold Embedding of Multi-Source Industrial Sensor Signals for Robust Pattern Recognition

Xu Wang, Puyu Han, Jiaju Kang et al.

Industrial sensor networks produce complex signals with nonlinear structure and shifting distributions. We propose RIE-SenseNet, a novel geometry-aware Transformer model that embeds sensor data in a Riemannian manifold to tackle these challenges. By leveraging hyperbolic geometry for sequence modeling and introducing a manifold-based augmentation technique, RIE-SenseNet preserves sensor signal structure and generates realistic synthetic samples. Experiments show RIE-SenseNet achieves >90% F1-score, far surpassing CNN and Transformer baselines. These results illustrate the benefit of combining non-Euclidean feature representations with geometry-consistent data augmentation for robust pattern recognition in industrial sensing.

IVNov 19, 2024
Translating Electrocardiograms to Cardiac Magnetic Resonance Imaging Useful for Cardiac Assessment and Disease Screening: A Multi-Center Study AI for ECG to CMR Translation Study

Zhengyao Ding, Ziyu Li, Yujian Hu et al.

Cardiovascular diseases (CVDs) are the leading cause of global mortality, necessitating accessible and accurate diagnostic tools. While cardiac magnetic resonance imaging (CMR) provides gold-standard insights into cardiac structure and function, its clinical utility is limited by high cost and complexity. In contrast, electrocardiography (ECG) is inexpensive and widely available but lacks the granularity of CMR. We propose CardioNets, a deep learning framework that translates 12-lead ECG signals into CMR-level functional parameters and synthetic images, enabling scalable cardiac assessment. CardioNets integrates cross-modal contrastive learning and generative pretraining, aligning ECG with CMR-derived cardiac phenotypes and synthesizing high-resolution CMR images via a masked autoregressive model. Trained on 159,819 samples from five cohorts, including the UK Biobank (n=42,483) and MIMIC-IV-ECG (n=164,550), and externally validated on independent clinical datasets (n=3,767), CardioNets achieved strong performance across disease screening and phenotype estimation tasks. In the UK Biobank, it improved cardiac phenotype regression R2 by 24.8% and cardiomyopathy AUC by up to 39.3% over baseline models. In MIMIC, it increased AUC for pulmonary hypertension detection by 5.6%. Generated CMR images showed 36.6% higher SSIM and 8.7% higher PSNR than prior approaches. In a reader study, ECG-only CardioNets achieved 13.9% higher accuracy than human physicians using both ECG and real CMR. These results suggest that CardioNets offers a promising, low-cost alternative to CMR for large-scale CVD screening, particularly in resource-limited settings. Future efforts will focus on clinical deployment and regulatory validation of ECG-based synthetic imaging.