CVJun 2, 2014

Generalized Max Pooling

arXiv:1406.0312v1226 citations
Originality Incremental advance
AI Analysis

This addresses a bottleneck in image classification for computer vision researchers, offering an incremental improvement over existing pooling methods.

The paper tackled the problem of discriminability in patch-based image representations by proposing Generalized Max Pooling (GMP), which equalizes the influence of frequent and rare descriptors beyond bag-of-visual-words methods, leading to significant performance gains on five public image classification benchmarks.

State-of-the-art patch-based image representations involve a pooling operation that aggregates statistics computed from local descriptors. Standard pooling operations include sum- and max-pooling. Sum-pooling lacks discriminability because the resulting representation is strongly influenced by frequent yet often uninformative descriptors, but only weakly influenced by rare yet potentially highly-informative ones. Max-pooling equalizes the influence of frequent and rare descriptors but is only applicable to representations that rely on count statistics, such as the bag-of-visual-words (BOV) and its soft- and sparse-coding extensions. We propose a novel pooling mechanism that achieves the same effect as max-pooling but is applicable beyond the BOV and especially to the state-of-the-art Fisher Vector -- hence the name Generalized Max Pooling (GMP). It involves equalizing the similarity between each patch and the pooled representation, which is shown to be equivalent to re-weighting the per-patch statistics. We show on five public image classification benchmarks that the proposed GMP can lead to significant performance gains with respect to heuristic alternatives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes