CVAIIVSep 29, 2025

AttentionViG: Cross-Attention-Based Dynamic Neighbor Aggregation in Vision GNNs

arXiv:2509.25570v12 citationsh-index: 116
Originality Incremental advance
AI Analysis

This addresses a bottleneck in ViGs for image recognition, offering an incremental improvement over existing methods.

The paper tackled the need for a versatile node-neighbor feature aggregation method in Vision Graph Neural Networks (ViGs) by proposing a cross-attention-based approach, achieving state-of-the-art performance on ImageNet-1K and strong results on downstream tasks like object detection and semantic segmentation.

Vision Graph Neural Networks (ViGs) have demonstrated promising performance in image recognition tasks against Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). An essential part of the ViG framework is the node-neighbor feature aggregation method. Although various graph convolution methods, such as Max-Relative, EdgeConv, GIN, and GraphSAGE, have been explored, a versatile aggregation method that effectively captures complex node-neighbor relationships without requiring architecture-specific refinements is needed. To address this gap, we propose a cross-attention-based aggregation method in which the query projections come from the node, while the key projections come from its neighbors. Additionally, we introduce a novel architecture called AttentionViG that uses the proposed cross-attention aggregation scheme to conduct non-local message passing. We evaluated the image recognition performance of AttentionViG on the ImageNet-1K benchmark, where it achieved SOTA performance. Additionally, we assessed its transferability to downstream tasks, including object detection and instance segmentation on MS COCO 2017, as well as semantic segmentation on ADE20K. Our results demonstrate that the proposed method not only achieves strong performance, but also maintains efficiency, delivering competitive accuracy with comparable FLOPs to prior vision GNN architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes