CVJun 3, 2022

Learning an Adaptation Function to Assess Image Visual Similarities

arXiv:2206.01417v13 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses a specific computer vision limitation for applications requiring visual analogy understanding, though it appears incremental as an expansion of previous work.

The paper tackles the problem of assessing visual similarity between images when analogy matters, which existing deep learning features struggle with because they're optimized for semantic categorization. The proposed method learns an adaptation function to approximate primate visual cortex representations, achieving a 2.25x improvement in retrieval scores on the Totally Looks Like dataset.

Human perception is routinely assessing the similarity between images, both for decision making and creative thinking. But the underlying cognitive process is not really well understood yet, hence difficult to be mimicked by computer vision systems. State-of-the-art approaches using deep architectures are often based on the comparison of images described as feature vectors learned for image categorization task. As a consequence, such features are powerful to compare semantically related images but not really efficient to compare images visually similar but semantically unrelated. Inspired by previous works on neural features adaptation to psycho-cognitive representations, we focus here on the specific task of learning visual image similarities when analogy matters. We propose to compare different supervised, semi-supervised and self-supervised networks, pre-trained on distinct scales and contents datasets (such as ImageNet-21k, ImageNet-1K or VGGFace2) to conclude which model may be the best to approximate the visual cortex and learn only an adaptation function corresponding to the approximation of the the primate IT cortex through the metric learning framework. Our experiments conducted on the Totally Looks Like image dataset highlight the interest of our method, by increasing the retrieval scores of the best model @1 by 2.25x. This research work was recently accepted for publication at the ICIP 2021 international conference [1]. In this new article, we expand on this previous work by using and comparing new pre-trained feature extractors on other datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes