CVNov 25, 2025

iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation

arXiv:2511.20635v22 citations
Originality Incremental advance
AI Analysis

This work addresses the need for versatile image generation and editing tools in computer vision, though it appears incremental as it adapts existing video models rather than introducing a fundamentally new approach.

The paper tackles the problem of generating diverse and dynamic image sets by repurposing a pre-trained video model to incorporate unconstrained image content, resulting in a unified framework that maintains strong cross-image consistency and surpasses conventional dynamic ranges.

Pre-trained video models learn powerful priors for generating high-quality, temporally coherent content. While these models excel at temporal coherence, their dynamics are often constrained by the continuous nature of their training data. We hypothesize that by injecting the rich and unconstrained content diversity from image data into this coherent temporal framework, we can generate image sets that feature both natural transitions and a far more expansive dynamic range. To this end, we introduce iMontage, a unified framework designed to repurpose a powerful video model into an all-in-one image generator. The framework consumes and produces variable-length image sets, unifying a wide array of image generation and editing tasks. To achieve this, we propose an elegant and minimally invasive adaptation strategy, complemented by a tailored data curation process and training paradigm. This approach allows the model to acquire broad image manipulation capabilities without corrupting its invaluable original motion priors. iMontage excels across several mainstream many-in-many-out tasks, not only maintaining strong cross-image contextual consistency but also generating scenes with extraordinary dynamics that surpass conventional scopes. Find our homepage at: https://kr1sjfu.github.io/iMontage-web/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes