CVNov 30, 2023

IMMA: Immunizing text-to-image Models against Malicious Adaptation

arXiv:2311.18815v311.016 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This addresses the security issue of preventing unauthorized or harmful content generation in AI models for developers and users, representing an incremental approach to existing data-poisoning techniques.

The paper tackles the problem of malicious adaptation in text-to-image models by proposing IMMA, a method to immunize model parameters against fine-tuning for harmful content, showing effectiveness in mitigating risks like style mimicry and inappropriate content generation across three adaptation methods.

Advancements in open-sourced text-to-image models and fine-tuning methods have led to the increasing risk of malicious adaptation, i.e., fine-tuning to generate harmful/unauthorized content. Recent works, e.g., Glaze or MIST, have developed data-poisoning techniques which protect the data against adaptation methods. In this work, we consider an alternative paradigm for protection. We propose to ``immunize'' the model by learning model parameters that are difficult for the adaptation methods when fine-tuning malicious content; in short IMMA. Specifically, IMMA should be applied before the release of the model weights to mitigate these risks. Empirical results show IMMA's effectiveness against malicious adaptations, including mimicking the artistic style and learning of inappropriate/unauthorized content, over three adaptation methods: LoRA, Textual-Inversion, and DreamBooth. The code is available at \url{https://github.com/amberyzheng/IMMA}.

View on arXiv PDF Code

Similar