CVJan 1

TimeColor: Flexible Reference Colorization via Temporal Concatenation

Bryan Constantine Sadihin, Yihao Meng, Michael Hua Wang, Matteo Jiahao Chen, Hang Su

arXiv:2601.00296v11.5h-index: 4

Originality Incremental advance

AI Analysis

This work addresses the need for more adaptable and consistent colorization in animation or video production, though it is incremental as it builds on existing diffusion-based methods with novel mechanisms.

The paper tackled the problem of sketch-based video colorization by enabling flexible use of multiple heterogeneous references, such as character sheets or arbitrary colorized frames, and achieved improvements in color fidelity, identity consistency, and temporal stability over prior baselines.

Most colorization models condition only on a single reference, typically the first frame of the scene. However, this approach ignores other sources of conditional data, such as character sheets, background images, or arbitrary colorized frames. We propose TimeColor, a sketch-based video colorization model that supports heterogeneous, variable-count references with the use of explicit per-reference region assignment. TimeColor encodes references as additional latent frames which are concatenated temporally, permitting them to be processed concurrently in each diffusion step while keeping the model's parameter count fixed. TimeColor also uses spatiotemporal correspondence-masked attention to enforce subject-reference binding in addition to modality-disjoint RoPE indexing. These mechanisms mitigate shortcutting and cross-identity palette leakage. Experiments on SAKUGA-42M under both single- and multi-reference protocols show that TimeColor improves color fidelity, identity consistency, and temporal stability over prior baselines.

View on arXiv PDF

Similar