CVAILGOct 23, 2021

An attention-driven hierarchical multi-scale representation for visual recognition

arXiv:2110.12178v11 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of subtle visual discrimination for computer vision applications, representing an incremental improvement over existing methods.

The paper tackled the problem of CNNs' inability to capture long-range dependencies for visual recognition, particularly in fine-grained tasks, by proposing an attention-driven hierarchical multi-scale representation using Graph Convolutional Networks, which outperformed state-of-the-art methods on three datasets and was competitive on two others.

Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content. This is mainly due to their ability to break down an image into smaller pieces, extract multi-scale localized features and compose them to construct highly expressive representations for decision making. However, the convolution operation is unable to capture long-range dependencies such as arbitrary relations between pixels since it operates on a fixed-size window. Therefore, it may not be suitable for discriminating subtle changes (e.g. fine-grained visual recognition). To this end, our proposed method captures the high-level long-range dependencies by exploring Graph Convolutional Networks (GCNs), which aggregate information by establishing relationships among multi-scale hierarchical regions. These regions consist of smaller (closer look) to larger (far look), and the dependency between regions is modeled by an innovative attention-driven message propagation, guided by the graph structure to emphasize the neighborhoods of a given region. Our approach is simple yet extremely effective in solving both the fine-grained and generic visual classification problems. It outperforms the state-of-the-arts with a significant margin on three and is very competitive on other two datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes