Exploring Image Representation with Decoupled Classical Visual Descriptors
This work addresses the problem of interpretability in computer vision for researchers and practitioners, offering a novel approach that integrates classical descriptors into modern learning, though it appears incremental in combining existing concepts.
The paper tackles the challenge of opaque deep learning representations by proposing VisualSplit, a framework that decomposes images into decoupled classical visual descriptors like edges and color, enabling interpretable learning. The result is a method that facilitates effective attribute control in tasks such as image generation and editing, suggesting its effectiveness for advanced visual understanding.
Exploring and understanding efficient image representations is a long-standing challenge in computer vision. While deep learning has achieved remarkable progress across image understanding tasks, its internal representations are often opaque, making it difficult to interpret how visual information is processed. In contrast, classical visual descriptors (e.g. edge, colour, and intensity distribution) have long been fundamental to image analysis and remain intuitively understandable to humans. Motivated by this gap, we ask a central question: Can modern learning benefit from these classical cues? In this paper, we answer it with VisualSplit, a framework that explicitly decomposes images into decoupled classical descriptors, treating each as an independent but complementary component of visual knowledge. Through a reconstruction-driven pre-training scheme, VisualSplit learns to capture the essence of each visual descriptor while preserving their interpretability. By explicitly decomposing visual attributes, our method inherently facilitates effective attribute control in various advanced visual tasks, including image generation and editing, extending beyond conventional classification and segmentation, suggesting the effectiveness of this new learning approach for visual understanding. Project page: https://chenyuanqu.com/VisualSplit/.