CVApr 8, 2022

Identifying Ambiguous Similarity Conditions via Semantic Matching

arXiv:2204.04053v19 citationsh-index: 40
Originality Incremental advance
AI Analysis

This addresses the challenge of conditional similarity learning in computer vision, but it is incremental as it builds on weakly supervised methods.

The paper tackles the problem of ambiguous similarity relationships in images by learning multiple embeddings to match semantic conditions without explicit labels, and introduces DiscoverNet which achieves state-of-the-art performance on benchmarks like UT-Zappos-50k and Celeb-A.

Rich semantics inside an image result in its ambiguous relationship with others, i.e., two images could be similar in one condition but dissimilar in another. Given triplets like "aircraft" is similar to "bird" than "train", Weakly Supervised Conditional Similarity Learning (WS-CSL) learns multiple embeddings to match semantic conditions without explicit condition labels such as "can fly". However, similarity relationships in a triplet are uncertain except providing a condition. For example, the previous comparison becomes invalid once the conditional label changes to "is vehicle". To this end, we introduce a novel evaluation criterion by predicting the comparison's correctness after assigning the learned embeddings to their optimal conditions, which measures how much WS-CSL could cover latent semantics as the supervised model. Furthermore, we propose the Distance Induced Semantic COndition VERification Network (DiscoverNet), which characterizes the instance-instance and triplets-condition relations in a "decompose-and-fuse" manner. To make the learned embeddings cover all semantics, DiscoverNet utilizes a set module or an additional regularizer over the correspondence between a triplet and a condition. DiscoverNet achieves state-of-the-art performance on benchmarks like UT-Zappos-50k and Celeb-A w.r.t. different criteria.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes