Luming Yang

CV
h-index4
3papers
Novelty47%
AI Score46

3 Papers

38.4CVApr 30Code
Robust Lightweight Crack Classification for Real-Time UAV Bridge Inspection

Wei Li, Haisheng Li, Weijie Li et al.

With the widespread application of Unmanned Aerial Vehicles (UAVs) in bridge structural health monitoring, deep learning-based automatic crack detection has become a major research focus. However, practical UAV inspections still face four key challenges: weak crack features, degraded imaging conditions, severe class imbalance, and limited computational resources for practical UAV inspection workflows. To address these issues, this paper proposes a unified lightweight convolutional neural network framework composed of four synergistic components: a lightweight backbone network, a Convolutional Block Attention Module (CBAM) for channel and spatial enhancement, a directed robust augmentation strategy based on inspection-scene priors, and Focal Loss for hard-sample learning under class imbalance. Experiments on the SDNET2018 bridge deck dataset show that the proposed method achieves an inference speed of 825 FPS with only 11.21M parameters and 1.82G FLOPs. Compared with the baseline model, the complete framework improves the F1-score by 2.51% and recall by 3.95%. In addition, Grad-CAM visualizations indicate that the introduced attention module shifts the model's focus from scattered regions to precise tracking along crack trajectories. Overall, this study achieves a strong balance among accuracy, speed, and robustness, providing a practical solution for ground-station assisted real-time deployment in UAV bridge inspections. The source code is available at: https://github.com/skylynf/AttXNet .

LGOct 11, 2025Code
Bidirectional Time-Frequency Pyramid Network for Enhanced Robust EEG Classification

Jiahui Hong, Siqing Li, Muqing Jian et al.

Existing EEG recognition models suffer from poor cross-paradigm generalization due to dataset-specific constraints and individual variability. To overcome these limitations, we propose BITE (Bidirectional Time-Freq Pyramid Network), an end-to-end unified architecture featuring robust multistream synergy, pyramid time-frequency attention (PTFA), and bidirectional adaptive convolutions. The framework uniquely integrates: 1) Aligned time-frequency streams maintaining temporal synchronization with STFT for bidirectional modeling, 2) PTFA-based multi-scale feature enhancement amplifying critical neural patterns, 3) BiTCN with learnable fusion capturing forward/backward neural dynamics. Demonstrating enhanced robustness, BITE achieves state-of-the-art performance across four divergent paradigms (BCICIV-2A/2B, HGD, SD-SSVEP), excelling in both within-subject accuracy and cross-subject generalization. As a unified architecture, it combines robust performance across both MI and SSVEP tasks with exceptional computational efficiency. Our work validates that paradigm-aligned spectral-temporal processing is essential for reliable BCI systems. Just as its name suggests, BITE "takes a bite out of EEG." The source code is available at https://github.com/cindy-hong/BiteEEG.

CVNov 13, 2025
Multivariate Gaussian Representation Learning for Medical Action Evaluation

Luming Yang, Haoxian Liu, Siqing Li et al.

Fine-grained action evaluation in medical vision faces unique challenges due to the unavailability of comprehensive datasets, stringent precision requirements, and insufficient spatiotemporal dynamic modeling of very rapid actions. To support development and evaluation, we introduce CPREval-6k, a multi-view, multi-label medical action benchmark containing 6,372 expert-annotated videos with 22 clinical labels. Using this dataset, we present GaussMedAct, a multivariate Gaussian encoding framework, to advance medical motion analysis through adaptive spatiotemporal representation learning. Multivariate Gaussian Representation projects the joint motions to a temporally scaled multi-dimensional space, and decomposes actions into adaptive 3D Gaussians that serve as tokens. These tokens preserve motion semantics through anisotropic covariance modeling while maintaining robustness to spatiotemporal noise. Hybrid Spatial Encoding, employing a Cartesian and Vector dual-stream strategy, effectively utilizes skeletal information in the form of joint and bone features. The proposed method achieves 92.1% Top-1 accuracy with real-time inference on the benchmark, outperforming the ST-GCN baseline by +5.9% accuracy with only 10% FLOPs. Cross-dataset experiments confirm the superiority of our method in robustness.