CVApr 1, 2022

Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes

arXiv:2204.00656v114 citationsh-index: 8
Originality Highly original
AI Analysis

This addresses the challenge of limited observation in applications like aerial imaging, offering a novel method for efficient scene classification with partial data.

The paper tackled the problem of classifying scenes with partial observations by developing a Sequential Transformers Attention Model (STAM) that predicts glimpse locations based on past glimpses, achieving 3% and 8% higher accuracy on ImageNet and fMoW datasets when sensing only 4% of the image area, and outperforming previous state-of-the-art by observing 27% and 42% fewer pixels.

Most hard attention models initially observe a complete scene to locate and sense informative glimpses, and predict class-label of a scene based on glimpses. However, in many applications (e.g., aerial imaging), observing an entire scene is not always feasible due to the limited time and resources available for acquisition. In this paper, we develop a Sequential Transformers Attention Model (STAM) that only partially observes a complete image and predicts informative glimpse locations solely based on past glimpses. We design our agent using DeiT-distilled and train it with a one-step actor-critic algorithm. Furthermore, to improve classification performance, we introduce a novel training objective, which enforces consistency between the class distribution predicted by a teacher model from a complete image and the class distribution predicted by our agent using glimpses. When the agent senses only 4% of the total image area, the inclusion of the proposed consistency loss in our training objective yields 3% and 8% higher accuracy on ImageNet and fMoW datasets, respectively. Moreover, our agent outperforms previous state-of-the-art by observing nearly 27% and 42% fewer pixels in glimpses on ImageNet and fMoW.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes