CLCVMar 18, 2025

Towards Harmless Multimodal Assistants with Blind Preference Optimization

arXiv:2503.14189v14 citationsh-index: 8MM
Originality Incremental advance
AI Analysis

This work addresses safety challenges for multimodal AI assistants, offering an incremental improvement through a new dataset and optimization method.

The paper tackles safety issues in Multimodal Large Language Models (MLLMs) by constructing the MMSafe-PO preference dataset and proposing Blind Preference Optimization (BPO), which improves the base MLLM's safety rate by 45.0% and reduces unsafe rates on benchmarks like MM-SafetyBench and HarmEval.

Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. Given the extensive applications of MLLMs, the associated safety issues have become increasingly critical. Due to the effectiveness of preference optimization in aligning MLLMs with human preferences, there is an urgent need for safety-related preference data for MLLMs. To address this, we construct the MMSafe-PO preference dataset towards harmless multimodal assistants, featuring multimodal instructions, the conversational format, and ranked paired responses from human feedback. We also identify two insightful observations: modality co-defense and modality cheating, which illustrate that MLLMs possess a certain level of inherent defense while still presenting unique safety challenges. Based on these observations, we propose the Blind Preference Optimization (BPO) approach. Comprehensive experiments on three benchmarks show that BPO effectively enhances the safety capabilities of MLLMs. Notably, BPO significantly improves the safety rate of the base MLLM by 45.0%, outperforming the DPO approach. Additionally, applying BPO to the MMSafe-PO dataset greatly reduces the base MLLM's unsafe rate on other safety benchmarks (14.5% on MM-SafetyBench and 82.9% on HarmEval, demonstrating the effectiveness and robustness of both the dataset and the approach. We release code and data at https://lu-yang666.github.io/MMsafe-PO-Web/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes