ControlCol: Controllability in Automatic Speaker Video Colorization
This addresses the need for interactive and controllable colorization in speaker videos, offering an incremental improvement over existing methods.
The paper tackles the problem of automatic speaker video colorization by introducing ControlCol, a system that provides user controllability while achieving high quality, outperforming the previous state-of-the-art DeOldify by 3.5% on metrics like PSNR and SSIM and being preferred 90% of the time in human evaluations.
Adding color to black-and-white speaker videos automatically is a highly desirable technique. It is an artistic process that requires interactivity with humans for the best results. Many existing automatic video colorization systems provide little opportunity for the user to guide the colorization process. In this work, we introduce a novel automatic speaker video colorization system which provides controllability to the user while also maintaining high colorization quality relative to state-of-the-art techniques. We name this system ControlCol. ControlCol performs 3.5% better than the previous state-of-the-art DeOldify on the Grid and Lombard Grid datasets when PSNR, SSIM, FID and FVD are used as metrics. This result is also supported by our human evaluation, where in a head-to-head comparison, ControlCol is preferred 90% of the time to DeOldify. Example videos can be seen in the supplementary material.