CVNov 22, 2017

The Devil is in the Middle: Exploiting Mid-level Representations for Cross-Domain Instance Matching

Qian Yu, Xiaobin Chang, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales

arXiv:1711.08106v215.495 citations

Originality Highly original

AI Analysis

This addresses the problem of matching object instances across different domains for computer vision applications, offering a novel approach that improves performance over existing methods.

The paper tackled cross-domain instance matching by proposing a unified framework that exploits both high and mid-level features from deep neural networks, showing that simple models outperform state-of-the-art ones in fine-grained sketch-based image retrieval and person re-identification.

Many vision problems require matching images of object instances across different domains. These include fine-grained sketch-based image retrieval (FG-SBIR) and Person Re-identification (person ReID). Existing approaches attempt to learn a joint embedding space where images from different domains can be directly compared. In most cases, this space is defined by the output of the final layer of a deep neural network (DNN), which primarily contains features of a high semantic level. In this paper, we argue that both high and mid-level features are relevant for cross-domain instance matching (CDIM). Importantly, mid-level features already exist in earlier layers of the DNN. They just need to be extracted, represented, and fused properly with the final layer. Based on this simple but powerful idea, we propose a unified framework for CDIM. Instantiating our framework for FG-SBIR and ReID, we show that our simple models can easily beat the state-of-the-art models, which are often equipped with much more elaborate architectures.

View on arXiv PDF

Similar