Drawing Attention to Detail: Pose Alignment through Self-Attention for Fine-Grained Object Classification
This work addresses intra-class variations in fine-grained classification for computer vision applications, representing an incremental improvement over existing methods.
The paper tackles fine-grained object classification by proposing an end-to-end trainable attention-based parts alignment module that replaces graph-matching with self-attention to learn optimal part arrangements, achieving competitive results on benchmark datasets.
Intra-class variations in the open world lead to various challenges in classification tasks. To overcome these challenges, fine-grained classification was introduced, and many approaches were proposed. Some rely on locating and using distinguishable local parts within images to achieve invariance to viewpoint changes, intra-class differences, and local part deformations. Our approach, which is inspired by P2P-Net, offers an end-to-end trainable attention-based parts alignment module, where we replace the graph-matching component used in it with a self-attention mechanism. The attention module is able to learn the optimal arrangement of parts while attending to each other, before contributing to the global loss.