CVNov 26, 2025

TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models

Jiaming He, Guanyu Hou, Hongwei Li, Zhicong Huang, Kangjie Chen, Yi Yu, Wenbo Jiang, Guowen Xu, Tianwei Zhang

arXiv:2511.21145v13.6h-index: 9Has Code

Originality Incremental advance

AI Analysis

This addresses safety evaluation challenges for text-to-video models, which is an incremental improvement over existing methods focused on static content.

The paper tackled the problem of evaluating safety risks in text-to-video models by proposing TEAR, a temporal-aware automated red-teaming framework that exploits temporal dynamics to elicit policy-violating videos, achieving over 80% attack success rate compared to prior best of 57%.

Text-to-Video (T2V) models are capable of synthesizing high-quality, temporally coherent dynamic video content, but the diverse generation also inherently introduces critical safety challenges. Existing safety evaluation methods,which focus on static image and text generation, are insufficient to capture the complex temporal dynamics in video generation. To address this, we propose a TEmporal-aware Automated Red-teaming framework, named TEAR, an automated framework designed to uncover safety risks specifically linked to the dynamic temporal sequencing of T2V models. TEAR employs a temporal-aware test generator optimized via a two-stage approach: initial generator training and temporal-aware online preference learning, to craft textually innocuous prompts that exploit temporal dynamics to elicit policy-violating video output. And a refine model is adopted to improve the prompt stealthiness and adversarial effectiveness cyclically. Extensive experimental evaluation demonstrates the effectiveness of TEAR across open-source and commercial T2V systems with over 80% attack success rate, a significant boost from prior best result of 57%.

View on arXiv PDF

Similar