CVDCJan 18, 2025

ClusterViG: Efficient Globally Aware Vision GNNs via Image Partitioning

arXiv:2501.10640v210 citationsh-index: 72025 International Conference on Multimedia Computing, Networking and Applications (MCNA)
Originality Incremental advance
AI Analysis

This addresses the problem of slow inference in vision GNNs for computer vision researchers and practitioners, offering a scalable solution that is incremental in improving efficiency over prior methods.

The paper tackles the efficiency bottleneck in Vision GNNs (ViGs) caused by expensive k-NN graph construction, proposing ClusterViG with a novel Dynamic Efficient Graph Convolution (DEGC) method that partitions images for parallel graph construction and integrates local and global feature learning. The result is a 5x reduction in inference latency while achieving state-of-the-art performance on tasks like image classification, object detection, and instance segmentation.

Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have dominated the field of Computer Vision (CV). Graph Neural Networks (GNN) have performed remarkably well across diverse domains because they can represent complex relationships via unstructured graphs. However, the applicability of GNNs for visual tasks was unexplored till the introduction of Vision GNNs (ViG). Despite the success of ViGs, their performance is severely bottlenecked due to the expensive $k$-Nearest Neighbors ($k$-NN) based graph construction. Recent works addressing this bottleneck impose constraints on the flexibility of GNNs to build unstructured graphs, undermining their core advantage while introducing additional inefficiencies. To address these issues, in this paper, we propose a novel method called Dynamic Efficient Graph Convolution (DEGC) for designing efficient and globally aware ViGs. DEGC partitions the input image and constructs graphs in parallel for each partition, improving graph construction efficiency. Further, DEGC integrates local intra-graph and global inter-graph feature learning, enabling enhanced global context awareness. Using DEGC as a building block, we propose a novel CNN-GNN architecture, ClusterViG, for CV tasks. Extensive experiments indicate that ClusterViG reduces end-to-end inference latency for vision tasks by up to $5\times$ when compared against a suite of models such as ViG, ViHGNN, PVG, and GreedyViG, with a similar model parameter count. Additionally, ClusterViG reaches state-of-the-art performance on image classification, object detection, and instance segmentation tasks, demonstrating the effectiveness of the proposed globally aware learning strategy. Finally, input partitioning performed by DEGC enables ClusterViG to be trained efficiently on higher-resolution images, underscoring the scalability of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes