CVJan 1

Attention to Detail: Global-Local Attention for High-Resolution AI-Generated Image Detection

arXiv:2601.00141v1h-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of identifying realistic AI-generated images for security and verification applications, representing an incremental improvement in detection methods.

The paper tackles the problem of detecting high-resolution AI-generated images by proposing GLASS, an architecture that combines global and local views without downsampling, resulting in improved predictive performance over standard transfer learning methods.

The rapid development of generative AI has made AI-generated images increasingly realistic and high-resolution. Most AI-generated image detection architectures typically downsample images before inputting them into models, risking the loss of fine-grained details. This paper presents GLASS (Global-Local Attention with Stratified Sampling), an architecture that combines a globally resized view with multiple randomly sampled local crops. These crops are original-resolution regions efficiently selected through spatially stratified sampling and aggregated using attention-based scoring. GLASS can be integrated into vision models to leverage both global and local information in images of any size. Vision Transformer, ResNet, and ConvNeXt models are used as backbones, and experiments show that GLASS outperforms standard transfer learning by achieving higher predictive performance within feasible computational constraints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes