LGFeb 21

When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models

Zhixiang Guo, Siyuan Liang, Andras Balogh, Noah Lunberry, Rong-Cheng Tu, Mark Jelasity, Dacheng Tao

arXiv:2602.18739v13.82 citationsh-index: 32

Originality Incremental advance

AI Analysis

This work addresses security risks for autonomous driving systems by revealing and quantifying vulnerabilities in generative world models, which is an incremental but important step for enhancing safety checks.

The paper tackles the problem of security vulnerabilities in generative world models used for driving videos by introducing PhysCond-WMA, a white-box attack that perturbs physical-condition channels to induce distortions while preserving perceptual fidelity, resulting in an attack success rate of 0.55 and degrading downstream tasks like 3D detection by about 4% and planning by about 20%.

Generative world models (WMs) are increasingly used to synthesize controllable, sensor-conditioned driving videos, yet their reliance on physical priors exposes novel attack surfaces. In this paper, we present Physical-Conditioned World Model Attack (PhysCond-WMA), the first white-box world model attack that perturbs physical-condition channels, such as HDMap embeddings and 3D-box features, to induce semantic, logic, or decision-level distortion while preserving perceptual fidelity. PhysCond-WMA is optimized in two stages: (1) a quality-preserving guidance stage that constrains reverse-diffusion loss below a calibrated threshold, and (2) a momentum-guided denoising stage that accumulates target-aligned gradients along the denoising trajectory for stable, temporally coherent semantic shifts. Extensive experimental results demonstrate that our approach remains effective while increasing FID by about 9% on average and FVD by about 3.9% on average. Under the targeted attack setting, the attack success rate (ASR) reaches 0.55. Downstream studies further show tangible risk, which using attacked videos for training decreases 3D detection performance by about 4%, and worsens open-loop planning performance by about 20%. These findings has for the first time revealed and quantified security vulnerabilities in generative world models, driving more comprehensive security checkers.

View on arXiv PDF

Similar