CVAug 21, 2024

ControlCol: Controllability in Automatic Speaker Video Colorization

arXiv:2408.11711v12 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the need for interactive and controllable colorization in speaker videos, offering an incremental improvement over existing methods.

The paper tackles the problem of automatic speaker video colorization by introducing ControlCol, a system that provides user controllability while achieving high quality, outperforming the previous state-of-the-art DeOldify by 3.5% on metrics like PSNR and SSIM and being preferred 90% of the time in human evaluations.

Adding color to black-and-white speaker videos automatically is a highly desirable technique. It is an artistic process that requires interactivity with humans for the best results. Many existing automatic video colorization systems provide little opportunity for the user to guide the colorization process. In this work, we introduce a novel automatic speaker video colorization system which provides controllability to the user while also maintaining high colorization quality relative to state-of-the-art techniques. We name this system ControlCol. ControlCol performs 3.5% better than the previous state-of-the-art DeOldify on the Grid and Lombard Grid datasets when PSNR, SSIM, FID and FVD are used as metrics. This result is also supported by our human evaluation, where in a head-to-head comparison, ControlCol is preferred 90% of the time to DeOldify. Example videos can be seen in the supplementary material.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes