CVApr 22, 2021

Aerial Scene Understanding in The Wild: Multi-Scene Recognition via Prototype-based Memory Networks

arXiv:2104.11200v119 citations
Originality Incremental advance
AI Analysis

This addresses a practical challenge in aerial scene understanding for remote sensing applications, though it is incremental by extending single-scene to multi-scene recognition.

The paper tackles the problem of recognizing multiple scenes in a single aerial image, which is more practical than single-scene classification, by proposing a prototype-based memory network that leverages well-annotated single-scene images to reduce annotation effort, achieving effective results as demonstrated on a new multi-scene aerial image dataset.

Aerial scene recognition is a fundamental visual task and has attracted an increasing research interest in the last few years. Most of current researches mainly deploy efforts to categorize an aerial image into one scene-level label, while in real-world scenarios, there often exist multiple scenes in a single image. Therefore, in this paper, we propose to take a step forward to a more practical and challenging task, namely multi-scene recognition in single images. Moreover, we note that manually yielding annotations for such a task is extraordinarily time- and labor-consuming. To address this, we propose a prototype-based memory network to recognize multiple scenes in a single image by leveraging massive well-annotated single-scene images. The proposed network consists of three key components: 1) a prototype learning module, 2) a prototype-inhabiting external memory, and 3) a multi-head attention-based memory retrieval module. To be more specific, we first learn the prototype representation of each aerial scene from single-scene aerial image datasets and store it in an external memory. Afterwards, a multi-head attention-based memory retrieval module is devised to retrieve scene prototypes relevant to query multi-scene images for final predictions. Notably, only a limited number of annotated multi-scene images are needed in the training phase. To facilitate the progress of aerial scene recognition, we produce a new multi-scene aerial image (MAI) dataset. Experimental results on variant dataset configurations demonstrate the effectiveness of our network. Our dataset and codes are publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes