CVMar 23

PROBE: Diagnosing Residual Concept Capacity in Erased Text-to-Video Diffusion Models

arXiv:2603.2154793.0h-index: 1Has Code

Predicted impact top 11% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses safety auditing for AI video generation by revealing limitations in existing erasure techniques, which is incremental but important for improving model safety.

The paper tackles the problem of evaluating concept erasure in text-to-video diffusion models, finding that current methods only suppress output-level content without fully removing representational capacity, as evidenced by measurable residual concept reactivation across multiple architectures and erasure strategies.

Concept erasure techniques for text-to-video (T2V) diffusion models report substantial suppression of sensitive content, yet current evaluation is limited to checking whether the target concept is absent from generated frames, treating output-level suppression as evidence of representational removal. We introduce PROBE, a diagnostic protocol that quantifies the \textit{reactivation potential} of erased concepts in T2V models. With all model parameters frozen, PROBE optimizes a lightweight pseudo-token embedding through a denoising reconstruction objective combined with a novel latent alignment constraint that anchors recovery to the spatiotemporal structure of the original concept. We make three contributions: (1) a multi-level evaluation framework spanning classifier-based detection, semantic similarity, temporal reactivation analysis, and human validation; (2) systematic experiments across three T2V architectures, three concept categories, and three erasure strategies revealing that all tested methods leave measurable residual capacity whose robustness correlates with intervention depth; and (3) the identification of temporal re-emergence, a video-specific failure mode where suppressed concepts progressively resurface across frames, invisible to frame-level metrics. These findings suggest that current erasure methods achieve output-level suppression rather than representational removal. We release our protocol to support reproducible safety auditing. Our code is available at https://github.com/YiweiXie/PRObingBasedEvaluation.

View on arXiv PDF Code

Similar