CVAILGApr 25, 2019

Local Relation Networks for Image Recognition

arXiv:1904.11491v1554 citations
Originality Highly original
AI Analysis

This addresses a bottleneck in computer vision for image recognition tasks, offering a novel feature extractor that improves semantic inference.

The paper tackled the inefficiency of fixed convolution filters in modeling varying spatial distributions by introducing a local relation layer that adaptively determines aggregation weights based on local pixel relationships, resulting in a Local Relation Network (LR-Net) that provides greater modeling capacity than regular convolution on ImageNet classification.

The convolution layer has been the dominant feature extractor in computer vision for years. However, the spatial aggregation in convolution is basically a pattern matching process that applies fixed filters which are inefficient at modeling visual elements with varying spatial distributions. This paper presents a new image feature extractor, called the local relation layer, that adaptively determines aggregation weights based on the compositional relationship of local pixel pairs. With this relational approach, it can composite visual elements into higher-level entities in a more efficient manner that benefits semantic inference. A network built with local relation layers, called the Local Relation Network (LR-Net), is found to provide greater modeling capacity than its counterpart built with regular convolution on large-scale recognition tasks such as ImageNet classification.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes