Automated Safety Benchmarking: A Multi-agent Pipeline for LVLMs
This addresses the need for efficient and dynamic safety evaluation in LVLMs, which is crucial for real-world reliability, though it is incremental as it automates an existing benchmarking process.
The paper tackles the problem of labor-intensive and static safety benchmarking for large vision-language models (LVLMs) by proposing VLSafetyBencher, an automated multi-agent pipeline that constructs high-quality safety benchmarks within one week at minimal cost, achieving a 70% safety rate disparity between the most and least safe models.
Large vision-language models (LVLMs) exhibit remarkable capabilities in cross-modal tasks but face significant safety challenges, which undermine their reliability in real-world applications. Efforts have been made to build LVLM safety evaluation benchmarks to uncover their vulnerability. However, existing benchmarks are hindered by their labor-intensive construction process, static complexity, and limited discriminative power. Thus, they may fail to keep pace with rapidly evolving models and emerging risks. To address these limitations, we propose VLSafetyBencher, the first automated system for LVLM safety benchmarking. VLSafetyBencher introduces four collaborative agents: Data Preprocessing, Generation, Augmentation, and Selection agents to construct and select high-quality samples. Experiments validates that VLSafetyBencher can construct high-quality safety benchmarks within one week at a minimal cost. The generated benchmark effectively distinguish safety, with a safety rate disparity of 70% between the most and least safe models.