CVJun 17, 2019

Multi-Scale Convolutions for Learning Context Aware Feature Representations

arXiv:1906.06978v11 citations
Originality Incremental advance
AI Analysis

This work addresses semantic correspondences for computer vision tasks, offering incremental improvements in feature representation and matching accuracy.

The paper tackles the problem of semantic matching by introducing a weakly supervised metric learning approach that generates stronger features through more context encoding, resulting in state-of-the-art performance on benchmarks and improved nearest neighbor matching.

Finding semantic correspondences is a challenging problem. With the breakthrough of CNNs stronger features are available for tasks like classification but not specifically for the requirements of semantic matching. In the following we present a weakly supervised metric learning approach which generates stronger features by encoding far more context than previous methods. First, we generate more suitable training data using a geometrically informed correspondence mining method which is less prone to spurious matches and requires only image category labels as supervision. Second, we introduce a new convolutional layer which is a learned mixture of differently strided convolutions and allows the network to encode implicitly more context while preserving matching accuracy at the same time. The strong geometric encoding on the feature side enables us to learn a semantic flow network, which generates more natural deformations than parametric transformation based models and is able to jointly predict foreground regions at the same time. Our semantic flow network outperforms current state-of-the-art on several semantic matching benchmarks and the learned features show astonishing performance regarding simple nearest neighbor matching.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes