5 Papers

CVJun 14, 2023
Deblurring Masked Autoencoder is Better Recipe for Ultrasound Image Recognition

Qingbo Kang, Jun Gao, Kang Li et al.

Masked autoencoder (MAE) has attracted unprecedented attention and achieves remarkable performance in many vision tasks. It reconstructs random masked image patches (known as proxy task) during pretraining and learns meaningful semantic representations that can be transferred to downstream tasks. However, MAE has not been thoroughly explored in ultrasound imaging. In this work, we investigate the potential of MAE for ultrasound image recognition. Motivated by the unique property of ultrasound imaging in high noise-to-signal ratio, we propose a novel deblurring MAE approach that incorporates deblurring into the proxy task during pretraining. The addition of deblurring facilitates the pretraining to better recover the subtle details presented in the ultrasound images, thus improving the performance of the downstream classification task. Our experimental results demonstrate the effectiveness of our deblurring MAE, achieving state-of-the-art performance in ultrasound image classification. Overall, our work highlights the potential of MAE for ultrasound image recognition and presents a novel approach that incorporates deblurring to further improve its effectiveness.

CVFeb 11
Improving Medical Visual Reinforcement Fine-Tuning via Perception and Reasoning Augmentation

Guangjing Yang, ZhangYuan Yu, Ziyuan Qin et al.

While recent advances in Reinforcement Fine-Tuning (RFT) have shown that rule-based reward schemes can enable effective post-training for large language models, their extension to cross-modal, vision-centric domains remains largely underexplored. This limitation is especially pronounced in the medical imaging domain, where effective performance requires both robust visual perception and structured reasoning. In this work, we address this gap by proposing VRFT-Aug, a visual reinforcement fine-tuning framework tailored for the medical domain. VRFT-Aug introduces a series of training strategies designed to augment both perception and reasoning, including prior knowledge injection, perception-driven policy refinement, medically informed reward shaping, and behavioral imitation. Together, these methods aim to stabilize and improve the RFT process. Through extensive experiments across multiple medical datasets, we show that our approaches consistently outperform both standard supervised fine-tuning and RFT baselines. Moreover, we provide empirically grounded insights and practical training heuristics that can be generalized to other medical image tasks. We hope this work contributes actionable guidance and fresh inspiration for the ongoing effort to develop reliable, reasoning-capable models for high-stakes medical applications.

CVMar 2, 2025
Confounder-Aware Medical Data Selection for Fine-Tuning Pretrained Vision Models

Anyang Ji, Qingbo Kang, Wei Xu et al.

The emergence of large-scale pre-trained vision foundation models has greatly advanced the medical imaging field through the pre-training and fine-tuning paradigm. However, selecting appropriate medical data for downstream fine-tuning remains a significant challenge considering its annotation cost, privacy concerns, and the detrimental effects of confounding variables. In this work, we present a confounder-aware medical data selection approach for medical dataset curation aiming to select minimal representative data by strategically mitigating the undesirable impact of confounding variables while preserving the natural distribution of the dataset. Our approach first identifies confounding variables within data and then develops a distance-based data selection strategy for confounder-aware sampling with a constrained budget in the data size. We validate the superiority of our approach through extensive experiments across diverse medical imaging modalities, highlighting its effectiveness in addressing the substantial impact of confounding variables and enhancing the fine-tuning efficiency in the medical imaging domain, compared to other data selection approaches.

CRFeb 10, 2015
An SVD-based Fragile Watermarking Scheme With Grouped Blocks

Qingbo Kang, Ke Li, Hu Chen

This paper proposes a novel fragile watermarking scheme for digital image authentication which is based on Singular Value Decomposition(SVD) and grouped blocks. The watermark bits which include two types of bits are inserted into the least significant bit(LSB) plane of the host image using the adaptive chaotic map to determine the positions. The groped blocks break the block-wise independence and therefore can withstand the Vector Quantization attack(VQ attack). The inserting positions are related to the statistical information of image block data, in order to increase the security and provide an auxiliary way to authenticate the image data. The effectiveness of the proposed scheme is checked by a variety of attacks, and the experimental results prove that it has a remarkable tamper detection ability and also has a precise locating ability.