CVCLMay 17

SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening

arXiv:2605.1761094.1Has Code
Predicted impact top 10% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For video content moderation, SafeLens provides an efficient and accurate solution that outperforms both open-source and closed-source models, demonstrating that efficient design can be more effective than scaling data or model size.

SafeLens introduces a fast-and-slow inference architecture for video guardrails, achieving state-of-the-art performance on real-world and AI-generated video benchmarks while significantly reducing inference cost compared to existing models like SafeWatch-8B, OmniGuard-7B, GPT-5.4, and Gemini-3.1-pro.

The rapid growth of online video platforms and AI-generated content has made reliable video guardrails a key challenge for safety and real-world deployment. While most videos can be screened through fast pattern recognition, a small subset requires deeper reasoning over temporally complex content and nuanced policy constraints. Existing approaches typically rely on large vision-language models applied uniformly across all inputs, resulting in high inference costs and inefficient allocation of computation. We propose SafeLens, a video guardrail framework that introduces a fast-and-slow inference architecture for efficient and accurate content moderation with variable computational cost across inputs. Additionally, we construct a high-quality dataset by applying influence-guided filtering to the SafeWatch Dataset, retaining only 2.4% of the original data. To further address limitations of training-time scaling, we enable test-time reasoning by augmenting the filtered data with structured Chain-of-Thought traces. Across real-world and AI-generated video benchmarks, SafeLens achieves state-of-the-art performance, outperforming strong open-source video guardrails (e.g., SafeWatch-8B, OmniGuard-7B) and closed-source models (e.g., GPT-5.4, Gemini-3.1-pro) while significantly reducing inference cost, demonstrating that efficient design serves to be more effective than scaling data or model size alone.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes