CVFeb 20

BLM-Guard: Explainable Multimodal Ad Moderation with Chain-of-Thought and Policy-Aligned Rewards

arXiv:2602.18193v12 citations
Originality Incremental advance
AI Analysis

This addresses the need for finer-grained, policy-driven moderation of commercial ads to combat deceptive content, though it appears incremental as it builds on existing multimodal and reasoning techniques.

The paper tackles the problem of moderating deceptive multimodal ads on short-video platforms by developing BLM-Guard, a framework that uses Chain-of-Thought reasoning and policy-aligned rewards, resulting in improved accuracy, consistency, and generalization over baselines in experiments on real ads.

Short-video platforms now host vast multimodal ads whose deceptive visuals, speech and subtitles demand finer-grained, policy-driven moderation than community safety filters. We present BLM-Guard, a content-audit framework for commercial ads that fuses Chain-of-Thought reasoning with rule-based policy principles and a critic-guided reward. A rule-driven ICoT data-synthesis pipeline jump-starts training by generating structured scene descriptions, reasoning chains and labels, cutting annotation costs. Reinforcement learning then refines the model using a composite reward balancing causal coherence with policy adherence. A multitask architecture models intra-modal manipulations (e.g., exaggerated imagery) and cross-modal mismatches (e.g., subtitle-speech drift), boosting robustness. Experiments on real short-video ads show BLM-Guard surpasses strong baselines in accuracy, consistency and generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes