CVJan 16, 2025

VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization

arXiv:2501.09499v11 citationsh-index: 24
Originality Incremental advance
AI Analysis

This work solves the problem of generating high-quality, controllable colorized videos for applications in media restoration and enhancement, representing an incremental improvement over existing methods.

The paper tackles video colorization by introducing VanGogh, a unified multimodal diffusion-based framework that addresses issues like color bleeding and lack of control, resulting in superior temporal consistency and color fidelity as demonstrated through evaluations and user studies.

Video colorization aims to transform grayscale videos into vivid color representations while maintaining temporal consistency and structural integrity. Existing video colorization methods often suffer from color bleeding and lack comprehensive control, particularly under complex motion or diverse semantic cues. To this end, we introduce VanGogh, a unified multimodal diffusion-based framework for video colorization. VanGogh tackles these challenges using a Dual Qformer to align and fuse features from multiple modalities, complemented by a depth-guided generation process and an optical flow loss, which help reduce color overflow. Additionally, a color injection strategy and luma channel replacement are implemented to improve generalization and mitigate flickering artifacts. Thanks to this design, users can exercise both global and local control over the generation process, resulting in higher-quality colorized videos. Extensive qualitative and quantitative evaluations, and user studies, demonstrate that VanGogh achieves superior temporal consistency and color fidelity.Project page: https://becauseimbatman0.github.io/VanGogh.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes