Yuan Ma

CV
h-index13
12papers
157citations
Novelty41%
AI Score46

12 Papers

CVDec 10, 2025Code
Benchmarking Real-World Medical Image Classification with Noisy Labels: Challenges, Practice, and Outlook

Yuan Ma, Junlin Hou, Chao Zhang et al.

Learning from noisy labels remains a major challenge in medical image analysis, where annotation demands expert knowledge and substantial inter-observer variability often leads to inconsistent or erroneous labels. Despite extensive research on learning with noisy labels (LNL), the robustness of existing methods in medical imaging has not been systematically assessed. To address this gap, we introduce LNMBench, a comprehensive benchmark for Label Noise in Medical imaging. LNMBench encompasses \textbf{10} representative methods evaluated across 7 datasets, 6 imaging modalities, and 3 noise patterns, establishing a unified and reproducible framework for robustness evaluation under realistic conditions. Comprehensive experiments reveal that the performance of existing LNL methods degrades substantially under high and real-world noise, highlighting the persistent challenges of class imbalance and domain variability in medical data. Motivated by these findings, we further propose a simple yet effective improvement to enhance model robustness under such conditions. The LNMBench codebase is publicly released to facilitate standardized evaluation, promote reproducible research, and provide practical insights for developing noise-resilient algorithms in both research and real-world medical applications.The codebase is publicly available on https://github.com/myyy777/LNMBench.

CVOct 9, 2023
Infrared Small Target Detection Using Double-Weighted Multi-Granularity Patch Tensor Model With Tensor-Train Decomposition

Guiyu Zhang, Qunbo Lv, Zui Tao et al.

Infrared small target detection plays an important role in the remote sensing fields. Therefore, many detection algorithms have been proposed, in which the infrared patch-tensor (IPT) model has become a mainstream tool due to its excellent performance. However, most IPT-based methods face great challenges, such as inaccurate measure of the tensor low-rankness and poor robustness to complex scenes, which will leadto poor detection performance. In order to solve these problems, this paper proposes a novel double-weighted multi-granularity infrared patch tensor (DWMGIPT) model. First, to capture different granularity information of tensor from multiple modes, a multi-granularity infrared patch tensor (MGIPT) model is constructed by collecting nonoverlapping patches and tensor augmentation based on the tensor train (TT) decomposition. Second, to explore the latent structure of tensor more efficiently, we utilize the auto-weighted mechanism to balance the importance of information at different granularity. Then, the steering kernel (SK) is employed to extract local structure prior, which suppresses background interference such as strong edges and noise. Finally, an efficient optimization algorithm based on the alternating direction method of multipliers (ADMM) is presented to solve the model. Extensive experiments in various challenging scenes show that the proposed algorithm is robust to noise and different scenes. Compared with the other eight state-of-the-art methods, different evaluation metrics demonstrate that our method achieves better detection performance in various complex scenes.

SPJan 9, 2024
Self-supervised Learning for Electroencephalogram: A Systematic Survey

Weining Weng, Yang Gu, Shuai Guo et al.

Electroencephalogram (EEG) is a non-invasive technique to record bioelectrical signals. Integrating supervised deep learning techniques with EEG signals has recently facilitated automatic analysis across diverse EEG-based tasks. However, the label issues of EEG signals have constrained the development of EEG-based deep models. Obtaining EEG annotations is difficult that requires domain experts to guide collection and labeling, and the variability of EEG signals among different subjects causes significant label shifts. To solve the above challenges, self-supervised learning (SSL) has been proposed to extract representations from unlabeled samples through well-designed pretext tasks. This paper concentrates on integrating SSL frameworks with temporal EEG signals to achieve efficient representation and proposes a systematic review of the SSL for EEG signals. In this paper, 1) we introduce the concept and theory of self-supervised learning and typical SSL frameworks. 2) We provide a comprehensive review of SSL for EEG analysis, including taxonomy, methodology, and technique details of the existing EEG-based SSL frameworks, and discuss the difference between these methods. 3) We investigate the adaptation of the SSL approach to various downstream tasks, including the task description and related benchmark datasets. 4) Finally, we discuss the potential directions for future SSL-EEG research.

CRApr 27
Detecting Avalanche Effect in Adversarial Settings: Spotting the Encryption Loops in Ransomware

Nanqing Luo, Xusheng Li, Haizhou Wang et al.

Spotting encryption loops in binary-only ransomware is a critical reverse engineering task. Since the existence of avalanche effect, an intrinsic characteristic of any secure encryption algorithms, is unavoidable during a victim data encryption attack, it is a very promising direction to spot encryption loops through avalanche effect detection. Unfortunately, no existing work in this direction ensures that the being-checked effect is the avalanche effect itself. Although CipherXRay is inspired by avalanche effect, it only checks whether a "ripple effect" (i.e., a necessary but non-sufficient condition) of avalanche effect exists, allowing a straightforward counterattack to succeed. In this work, we present a new approach that checks the avalanche effect itself. Because the detection is conducted in adversarial settings (e.g., the ransomware author may obfuscate the code), a viable approach must tolerate inaccurate input \& output identification and must be resilient to adversarial evasion. These challenges are addressed by a novel record-and-replay detection mechanism that takes advantage of the statistical guarantees provided by the Shapiro-Wilk normality test. The experimental results show that our approach achieves 0.0\% false negative rate and 1.1\% false positive rate. When our tool is employed to reverse engineer real-world ransomware samples, it succeeds in analyzing all the ransomware samples selected from ten representative families.

ROSep 16, 2025
The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via Differentiable Token Pruning

Titong Jiang, Xuefeng Jiang, Yuan Ma et al.

We present LightVLA, a simple yet effective differentiable token pruning framework for vision-language-action (VLA) models. While VLA models have shown impressive capability in executing real-world robotic tasks, their deployment on resource-constrained platforms is often bottlenecked by the heavy attention-based computation over large sets of visual tokens. LightVLA addresses this challenge through adaptive, performance-driven pruning of visual tokens: It generates dynamic queries to evaluate visual token importance, and adopts Gumbel softmax to enable differentiable token selection. Through fine-tuning, LightVLA learns to preserve the most informative visual tokens while pruning tokens which do not contribute to task execution, thereby improving efficiency and performance simultaneously. Notably, LightVLA requires no heuristic magic numbers and introduces no additional trainable parameters, making it compatible with modern inference frameworks. Experimental results demonstrate that LightVLA outperforms different VLA models and existing token pruning methods across diverse tasks on the LIBERO benchmark, achieving higher success rates with substantially reduced computational overhead. Specifically, LightVLA reduces FLOPs and latency by 59.1% and 38.2% respectively, with a 2.6% improvement in task success rate. Meanwhile, we also investigate the learnable query-based token pruning method LightVLA* with additional trainable parameters, which also achieves satisfactory performance. Our work reveals that as VLA pursues optimal performance, LightVLA spontaneously learns to prune tokens from a performance-driven perspective. To the best of our knowledge, LightVLA is the first work to apply adaptive visual token pruning to VLA tasks with the collateral goals of efficiency and performance, marking a significant step toward more efficient, powerful and practical real-time robotic systems.

ROMay 14, 2025
TransDiffuser: Diverse Trajectory Generation with Decorrelated Multi-modal Representation for End-to-end Autonomous Driving

Xuefeng Jiang, Yuan Ma, Pengxiang Li et al.

In recent years, diffusion models have demonstrated remarkable potential across diverse domains, from vision generation to language modeling. Transferring its generative capabilities to modern end-to-end autonomous driving systems has also emerged as a promising direction. However, existing diffusion-based trajectory generative models often exhibit mode collapse where different random noises converge to similar trajectories after the denoising process.Therefore, state-of-the-art models often rely on anchored trajectories from pre-defined trajectory vocabulary or scene priors in the training set to mitigate collapse and enrich the diversity of generated trajectories, but such inductive bias are not available in real-world deployment, which can be challenged when generalizing to unseen scenarios. In this work, we investigate the possibility of effectively tackling the mode collapse challenge without the assumption of pre-defined trajectory vocabulary or pre-computed scene priors. Specifically, we propose TransDiffuser, an encoder-decoder based generative trajectory planning model, where the encoded scene information and motion states serve as the multi-modal conditional input of the denoising decoder. Different from existing approaches, we exploit a simple yet effective multi-modal representation decorrelation optimization mechanism during the denoising process to enrich the latent representation space which better guides the downstream generation. Without any predefined trajectory anchors or pre-computed scene priors, TransDiffuser achieves the PDMS of 94.85 on the closed-loop planning-oriented benchmark NAVSIM, surpassing previous state-of-the-art methods. Qualitative evaluation further showcases TransDiffuser generates more diverse and plausible trajectories which explore more drivable area.

IVDec 28, 2024
Implementing Trust in Non-Small Cell Lung Cancer Diagnosis with a Conformalized Uncertainty-Aware AI Framework in Whole-Slide Images

Xiaoge Zhang, Tao Wang, Chao Yan et al.

Ensuring trustworthiness is fundamental to the development of artificial intelligence (AI) that is considered societally responsible, particularly in cancer diagnostics, where a misdiagnosis can have dire consequences. Current digital pathology AI models lack systematic solutions to address trustworthiness concerns arising from model limitations and data discrepancies between model deployment and development environments. To address this issue, we developed TRUECAM, a framework designed to ensure both data and model trustworthiness in non-small cell lung cancer subtyping with whole-slide images. TRUECAM integrates 1) a spectral-normalized neural Gaussian process for identifying out-of-scope inputs and 2) an ambiguity-guided elimination of tiles to filter out highly ambiguous regions, addressing data trustworthiness, as well as 3) conformal prediction to ensure controlled error rates. We systematically evaluated the framework across multiple large-scale cancer datasets, leveraging both task-specific and foundation models, illustrate that an AI model wrapped with TRUECAM significantly outperforms models that lack such guidance, in terms of classification accuracy, robustness, interpretability, and data efficiency, while also achieving improvements in fairness. These findings highlight TRUECAM as a versatile wrapper framework for digital pathology AI models with diverse architectural designs, promoting their responsible and effective applications in real-world settings.

CRDec 22, 2024
Backdoor Attack with Invisible Triggers Based on Model Architecture Modification

Yuan Ma, Jiankang Wei, Yilun Lyu et al.

Machine learning systems are vulnerable to backdoor attacks, where attackers manipulate model behavior through data tampering or architectural modifications. Traditional backdoor attacks involve injecting malicious samples with specific triggers into the training data, causing the model to produce targeted incorrect outputs in the presence of the corresponding triggers. More sophisticated attacks modify the model's architecture directly, embedding backdoors that are harder to detect as they evade traditional data-based detection methods. However, the drawback of the architectural modification based backdoor attacks is that the trigger must be visible in order to activate the backdoor. To further strengthen the invisibility of the backdoor attacks, a novel backdoor attack method is presented in the paper. To be more specific, this method embeds the backdoor within the model's architecture and has the capability to generate inconspicuous and stealthy triggers. The attack is implemented by modifying pre-trained models, which are then redistributed, thereby posing a potential threat to unsuspecting users. Comprehensive experiments conducted on standard computer vision benchmarks validate the effectiveness of this attack and highlight the stealthiness of its triggers, which remain undetectable through both manual visual inspection and advanced detection tools.

IVNov 8, 2019
Perception-oriented Single Image Super-Resolution via Dual Relativistic Average Generative Adversarial Networks

Yuan Ma, Kewen Liu, Hongxia Xiong et al.

The presence of residual and dense neural networks which greatly promotes the development of image Super-Resolution(SR) have witnessed a lot of impressive results. Depending on our observation, although more layers and connections could always improve performance, the increase of model parameters is not conducive to launch application of SR algorithms. Furthermore, algorithms supervised by L1/L2 loss can achieve considerable performance on traditional metrics such as PSNR and SSIM, yet resulting in blurry and over-smoothed outputs without sufficient high-frequency details, namely low perceptual index(PI). Regarding the issues, this paper develops a perception-oriented single image SR algorithm via dual relativistic average generative adversarial networks. In the generator part, a novel residual channel attention block is proposed to recalibrate significance of specific channels, further increasing feature expression capabilities. Parameters of convolutional layers within each block are shared to expand receptive fields while maintain the amount of tunable parameters unchanged. The feature maps are subsampled using sub-pixel convolution to obtain reconstructed high-resolution images. The discriminator part consists of two relativistic average discriminators that work in pixel domain and feature domain, respectively, fully exploiting the prior that half of data in a mini-batch are fake. Different weighted combinations of perceptual loss and adversarial loss are utilized to supervise the generator to equilibrate perceptual quality and objective results. Experimental results and ablation studies show that our proposed algorithm can rival state-of-the-art SR algorithms, both perceptually(PI-minimization) and objectively(PSNR-maximization) with fewer parameters.

IVJun 15, 2019
Single Image Super-resolution via Dense Blended Attention Generative Adversarial Network for Clinical Diagnosis

Kewen Liu, Yuan Ma, Hongxia Xiong et al.

During training phase, more connections (e.g. channel concatenation in last layer of DenseNet) means more occupied GPU memory and lower GPU utilization, requiring more training time. The increase of training time is also not conducive to launch application of SR algorithms. This's why we abandoned DenseNet as basic network. Futhermore, we abandoned this paper due to its limitation only applied on medical images. Please view our lastest work applied on general images at arXiv:1911.03464.

CVMay 13, 2019
Medical image super-resolution method based on dense blended attention network

Kewen Liu, Yuan Ma, Hongxia Xiong et al.

In order to address the issue that medical image would suffer from severe blurring caused by the lack of high-frequency details in the process of image super-resolution reconstruction, a novel medical image super-resolution method based on dense neural network and blended attention mechanism is proposed. The proposed method adds blended attention blocks to dense neural network(DenseNet), so that the neural network can concentrate more attention to the regions and channels with sufficient high-frequency details. Batch normalization layers are removed to avoid loss of high-frequency texture details. Final obtained high resolution medical image are obtained using deconvolutional layers at the very end of the network as up-sampling operators. Experimental results show that the proposed method has an improvement of 0.05db to 11.25dB and 0.6% to 14.04% on the peak signal-to-noise ratio(PSNR) metric and structural similarity index(SSIM) metric, respectively, compared with the mainstream image super-resolution methods. This work provides a new idea for theoretical studies of medical image super-resolution reconstruction.

LGOct 25, 2015
Vehicle Speed Prediction using Deep Learning

Joe Lemieux, Yuan Ma

Global optimization of the energy consumption of dual power source vehicles such as hybrid electric vehicles, plug-in hybrid electric vehicles, and plug in fuel cell electric vehicles requires knowledge of the complete route characteristics at the beginning of the trip. One of the main characteristics is the vehicle speed profile across the route. The profile will translate directly into energy requirements for a given vehicle. However, the vehicle speed that a given driver chooses will vary from driver to driver and from time to time, and may be slower, equal to, or faster than the average traffic flow. If the specific driver speed profile can be predicted, the energy usage can be optimized across the route chosen. The purpose of this paper is to research the application of Deep Learning techniques to this problem to identify at the beginning of a drive cycle the driver specific vehicle speed profile for an individual driver repeated drive cycle, which can be used in an optimization algorithm to minimize the amount of fossil fuel energy used during the trip.