CVLGJul 2, 2024

Light-weight Fine-tuning Method for Defending Adversarial Noise in Pre-trained Medical Vision-Language Models

arXiv:2407.02716v224 citationsh-index: 6
AI Analysis

This work addresses adversarial attacks in medical multi-modal generation, offering a domain-specific solution that is incremental in improving fine-tuning methods.

The paper tackled the problem of adversarial noise in pre-trained medical vision-language models by proposing a rectify adversarial noise (RAN) framework, which enhanced model robustness and transferability with moderate noise but showed negative impacts at higher levels, achieving effective defense during fine-tuning.

Fine-tuning pre-trained Vision-Language Models (VLMs) has shown remarkable capabilities in medical image and textual depiction synergy. Nevertheless, many pre-training datasets are restricted by patient privacy concerns, potentially containing noise that can adversely affect downstream performance. Moreover, the growing reliance on multi-modal generation exacerbates this issue because of its susceptibility to adversarial attacks. To investigate how VLMs trained on adversarial noisy data perform on downstream medical tasks, we first craft noisy upstream datasets using multi-modal adversarial attacks. Through our comprehensive analysis, we unveil that moderate noise enhances model robustness and transferability, but increasing noise levels negatively impact downstream task performance. To mitigate this issue, we propose rectify adversarial noise (RAN) framework, a recipe designed to effectively defend adversarial attacks and rectify the influence of upstream noise during fine-tuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes