DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images
This work addresses labor cost reduction in hand-drawn anime production, but it is incremental as it builds on existing deep learning approaches by removing constraints on reference image numbers.
The paper tackles the problem of automatic colorization of anime line drawings, which is challenging due to occlusions, pose variations, and viewpoint changes, and proposes DACoN, a framework that allows any number of reference images and achieves superior colorization performance.
Automatic colorization of line drawings has been widely studied to reduce the labor cost of hand-drawn anime production. Deep learning approaches, including image/video generation and feature-based correspondence, have improved accuracy but struggle with occlusions, pose variations, and viewpoint changes. To address these challenges, we propose DACoN, a framework that leverages foundation models to capture part-level semantics, even in line drawings. Our method fuses low-resolution semantic features from foundation models with high-resolution spatial features from CNNs for fine-grained yet robust feature extraction. In contrast to previous methods that rely on the Multiplex Transformer and support only one or two reference images, DACoN removes this constraint, allowing any number of references. Quantitative and qualitative evaluations demonstrate the benefits of using multiple reference images, achieving superior colorization performance. Our code and model are available at https://github.com/kzmngt/DACoN.