CVFeb 20

BLM-Guard: Explainable Multimodal Ad Moderation with Chain-of-Thought and Policy-Aligned Rewards

Yiran Yang, Zhaowei Liu, Yuan Yuan, Yukun Song, Xiong Ma, Yinghao Song, Xiangji Zeng, Lu Sun, Yulu Wang, Hai Zhou, Shuai Cui, Zhaohan Gong

arXiv:2602.18193v14.02 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses the need for finer-grained, policy-driven moderation of commercial ads to combat deceptive content, though it appears incremental as it builds on existing multimodal and reasoning techniques.

The paper tackles the problem of moderating deceptive multimodal ads on short-video platforms by developing BLM-Guard, a framework that uses Chain-of-Thought reasoning and policy-aligned rewards, resulting in improved accuracy, consistency, and generalization over baselines in experiments on real ads.

Short-video platforms now host vast multimodal ads whose deceptive visuals, speech and subtitles demand finer-grained, policy-driven moderation than community safety filters. We present BLM-Guard, a content-audit framework for commercial ads that fuses Chain-of-Thought reasoning with rule-based policy principles and a critic-guided reward. A rule-driven ICoT data-synthesis pipeline jump-starts training by generating structured scene descriptions, reasoning chains and labels, cutting annotation costs. Reinforcement learning then refines the model using a composite reward balancing causal coherence with policy adherence. A multitask architecture models intra-modal manipulations (e.g., exaggerated imagery) and cross-modal mismatches (e.g., subtitle-speech drift), boosting robustness. Experiments on real short-video ads show BLM-Guard surpasses strong baselines in accuracy, consistency and generalization.

View on arXiv PDF

Similar