CVMay 17

Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration

arXiv:2605.1742390.9
Predicted impact top 14% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For video generation practitioners, it addresses the bottleneck of identity drift and semantic erosion in long-form video-to-video generation.

Soap2Soap tackles long-horizon cinematic video remaking (e.g., full episodes) with stylization or actor replacement while preserving narrative and identity across hundreds of shots. It achieves strong improvements over commercial APIs in long-term consistency and narrative fidelity on SoapBench.

We study series-level cinematic remaking, a long-horizon video-to-video generation problem that localizes full episodes or films via stylization or actor replacement while strictly preserving narrative structure, motion choreography, and character identity across hundreds of shots. Existing video generation and editing pipelines often break down in this regime due to compounding identity drift, background mutation, and semantic erosion under large camera motions and viewpoint changes. We propose Soap2Soap, a multi-agent framework that enforces long-term language-visual consistency through a Dual-Bridge Consistency mechanism: a scene-aware JSON screenplay serving as a persistent semantic backbone, and dynamically allocated visual reference anchors at both scene and shot levels. To suppress drift before video synthesis, we introduce batch keyframe consistency, jointly generating multiple keyframes in a shared latent context via a grid-based formulation. A closed-loop verification agent further audits identity, stability, and alignment to trigger selective regeneration. Experiments on SoapBench demonstrate strong improvements over commercial video generation APIs in long-term consistency and narrative fidelity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes