CVApr 12

Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation

Georgia Tech
arXiv:2604.1083781.7h-index: 19
AI Analysis

For society and security, it tackles the underexplored problem of protecting static images from unauthorized animation into deepfake videos.

The paper addresses the lack of defenses against image-to-video (I2V) generation deepfakes, proposing Immune2V that achieves stronger and more persistent degradation than adapted image-level baselines under the same imperceptibility budget.

Image-to-video (I2V) generation has the potential for societal harm because it enables the unauthorized animation of static images to create realistic deepfakes. While existing defenses effectively protect against static image manipulation, extending these to I2V generation remains underexplored and non-trivial. In this paper, we systematically analyze why modern I2V models are highly robust against naive image-level adversarial attacks (i.e., immunization). We observe that the video encoding process rapidly dilutes the adversarial noise across future frames, and the continuous text-conditioned guidance actively overrides the intended disruptive effect of the immunization. Building on these findings, we propose the Immune2V framework which enforces temporally balanced latent divergence at the encoder level to prevent signal dilution, and aligns intermediate generative representations with a precomputed collapse-inducing trajectory to counteract the text-guidance override. Extensive experiments demonstrate that Immune2V produces substantially stronger and more persistent degradation than adapted image-level baselines under the same imperceptibility budget.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes