CYCLCVFeb 22, 2025

A Framework for Evaluating Vision-Language Model Safety: Building Trust in AI for Public Sector Applications

arXiv:2502.16361v11 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses safety concerns for public sector AI deployments, but it is incremental as it builds on existing adversarial attack methods like FGSM.

The paper tackles the problem of evaluating safety and adversarial vulnerabilities in Vision-Language Models for public sector use by introducing a framework that quantifies risks through noise analysis and adversarial attacks, resulting in a new Vulnerability Score to assess model robustness.

Vision-Language Models (VLMs) are increasingly deployed in public sector missions, necessitating robust evaluation of their safety and vulnerability to adversarial attacks. This paper introduces a novel framework to quantify adversarial risks in VLMs. We analyze model performance under Gaussian, salt-and-pepper, and uniform noise, identifying misclassification thresholds and deriving composite noise patches and saliency patterns that highlight vulnerable regions. These patterns are compared against the Fast Gradient Sign Method (FGSM) to assess their adversarial effectiveness. We propose a new Vulnerability Score that combines the impact of random noise and adversarial attacks, providing a comprehensive metric for evaluating model robustness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes