CVNov 30, 2023

IMMA: Immunizing text-to-image Models against Malicious Adaptation

arXiv:2311.18815v316 citationsh-index: 4Has Code
Originality Incremental advance
AI Analysis

This addresses the security issue of preventing unauthorized or harmful content generation in AI models for developers and users, representing an incremental approach to existing data-poisoning techniques.

The paper tackles the problem of malicious adaptation in text-to-image models by proposing IMMA, a method to immunize model parameters against fine-tuning for harmful content, showing effectiveness in mitigating risks like style mimicry and inappropriate content generation across three adaptation methods.

Advancements in open-sourced text-to-image models and fine-tuning methods have led to the increasing risk of malicious adaptation, i.e., fine-tuning to generate harmful/unauthorized content. Recent works, e.g., Glaze or MIST, have developed data-poisoning techniques which protect the data against adaptation methods. In this work, we consider an alternative paradigm for protection. We propose to ``immunize'' the model by learning model parameters that are difficult for the adaptation methods when fine-tuning malicious content; in short IMMA. Specifically, IMMA should be applied before the release of the model weights to mitigate these risks. Empirical results show IMMA's effectiveness against malicious adaptations, including mimicking the artistic style and learning of inappropriate/unauthorized content, over three adaptation methods: LoRA, Textual-Inversion, and DreamBooth. The code is available at \url{https://github.com/amberyzheng/IMMA}.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes