IdentityGuard: Context-Aware Restriction and Provenance for Personalized Synthesis
This addresses the problem of collateral damage from global filters in AI safety for users of personalized synthesis models, offering a more targeted solution.
The paper tackles the safety challenge of personalized text-to-image models by proposing a context-aware approach that blocks harmful content only when combined with personalized identities, preventing misuse while preserving model utility and enabling traceability.
The nature of personalized text-to-image models poses a unique safety challenge that generic context-blind methods are ill-equipped to handle. Such global filters create a dilemma: to prevent misuse, they are forced to damage the model's broader utility by erasing concepts entirely, causing unacceptable collateral damage.Our work presents a more precisely targeted approach, built on the principle that security should be as context-aware as the threat itself, intrinsically bound to the personalized concept. We present IDENTITYGUARD, which realizes this principle through a conditional restriction that blocks harmful content only when combined with the personalized identity, and a concept-specific watermark for precise traceability. Experiments show our approach prevents misuse while preserving the model's utility and enabling robust traceability. By moving beyond blunt, global filters, our work demonstrates a more effective and responsible path toward AI safety.