CVJun 21, 2025

LoLA-SpecViT: Local Attention SwiGLU Vision Transformer with LoRA for Hyperspectral Imaging

arXiv:2506.17759v16 citationsh-index: 35Has CodeItc J
Originality Incremental advance
AI Analysis

This work addresses the problem of scalable and adaptable hyperspectral imaging for applications in agriculture and remote sensing, though it is incremental as it builds on existing transformer and LoRA methods.

The paper tackled hyperspectral image classification by proposing LoLA-SpecViT, a lightweight vision transformer that achieved up to 99.91% accuracy with over 80% fewer trainable parameters, enhancing robustness in low-label conditions.

Hyperspectral image classification remains a challenging task due to the high dimensionality of spectral data, significant inter-band redundancy, and the limited availability of annotated samples. While recent transformer-based models have improved the global modeling of spectral-spatial dependencies, their scalability and adaptability under label-scarce conditions remain limited. In this work, we propose \textbf{LoLA-SpecViT}(Low-rank adaptation Local Attention Spectral Vision Transformer), a lightweight spectral vision transformer that addresses these limitations through a parameter-efficient architecture tailored to the unique characteristics of hyperspectral imagery. Our model combines a 3D convolutional spectral front-end with local window-based self-attention, enhancing both spectral feature extraction and spatial consistency while reducing computational complexity. To further improve adaptability, we integrate low-rank adaptation (LoRA) into attention and projection layers, enabling fine-tuning with over 80\% fewer trainable parameters. A novel cyclical learning rate scheduler modulates LoRA adaptation strength during training, improving convergence and generalisation. Extensive experiments on three benchmark datasets WHU-Hi LongKou, WHU-Hi HongHu, and Salinas demonstrate that LoLA-SpecViT consistently outperforms state-of-the-art baselines, achieving up to 99.91\% accuracy with substantially fewer parameters and enhanced robustness under low-label regimes. The proposed framework provides a scalable and generalizable solution for real-world HSI applications in agriculture, environmental monitoring, and remote sensing analytics. Our code is available in the following \href{https://github.com/FadiZidiDz/LoLA-SpecViT}{GitHub Repository}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes