CV LG MLAug 18, 2023

Generalized Sum Pooling for Metric Learning

Yeti Z. Gurbuz, Ozan Sener, A. Aydın Alatan

arXiv:2308.09228v23.910 citationsh-index: 23Has Code

Originality Incremental advance

AI Analysis

This work addresses a specific architectural bottleneck in metric learning for computer vision applications, offering an incremental improvement over existing pooling methods.

The paper tackles the limitation of global average pooling (GAP) in deep metric learning by proposing a learnable generalized sum pooling (GSP) method that selects subsets of semantic entities and weights their importance, achieving improved performance on four popular benchmarks.

A common architectural choice for deep metric learning is a convolutional neural network followed by global average pooling (GAP). Albeit simple, GAP is a highly effective way to aggregate information. One possible explanation for the effectiveness of GAP is considering each feature vector as representing a different semantic entity and GAP as a convex combination of them. Following this perspective, we generalize GAP and propose a learnable generalized sum pooling method (GSP). GSP improves GAP with two distinct abilities: i) the ability to choose a subset of semantic entities, effectively learning to ignore nuisance information, and ii) learning the weights corresponding to the importance of each entity. Formally, we propose an entropy-smoothed optimal transport problem and show that it is a strict generalization of GAP, i.e., a specific realization of the problem gives back GAP. We show that this optimization problem enjoys analytical gradients enabling us to use it as a direct learnable replacement for GAP. We further propose a zero-shot loss to ease the learning of GSP. We show the effectiveness of our method with extensive evaluations on 4 popular metric learning benchmarks. Code is available at: GSP-DML Framework

View on arXiv PDF Code

Similar