Learning to Compose Hypercolumns for Visual Correspondence
This work addresses the challenge of adaptive feature selection for semantic correspondence in computer vision, offering an efficient solution for matching images of different instances within the same category.
The paper tackles the problem of static feature representation in visual correspondence by introducing Dynamic Hyperpixel Flow, which dynamically composes hypercolumn features from relevant CNN layers based on input images, resulting in improved matching performance over state-of-the-art methods on standard benchmarks.
Feature representation plays a crucial role in visual correspondence, and recent methods for image matching resort to deeply stacked convolutional layers. These models, however, are both monolithic and static in the sense that they typically use a specific level of features, e.g., the output of the last layer, and adhere to it regardless of the images to match. In this work, we introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match. Inspired by both multi-layer feature composition in object detection and adaptive inference architectures in classification, the proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network. We demonstrate the effectiveness on the task of semantic correspondence, i.e., establishing correspondences between images depicting different instances of the same object or scene category. Experiments on standard benchmarks show that the proposed method greatly improves matching performance over the state of the art in an adaptive and efficient manner.