CV IV SPMar 3, 2024

LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth Limited Optical Signal Acquisition

arXiv:2403.01412v12.0h-index: 17Has CodeICLR

Originality Highly original

AI Analysis

This addresses bandwidth limitations in optical signal acquisition for real-time detection applications, representing a novel method for a known bottleneck.

The paper tackles the problem of bandwidth constraints in real-time hyperspectral signal acquisition by introducing LUM-ViT, a Vision Transformer variant with a learnable under-sampling mask for pre-acquisition modulation, achieving only a 1.8% accuracy loss on ImageNet classification while sampling just 10% of pixels.

Bandwidth constraints during signal acquisition frequently impede real-time detection applications. Hyperspectral data is a notable example, whose vast volume compromises real-time hyperspectral detection. To tackle this hurdle, we introduce a novel approach leveraging pre-acquisition modulation to reduce the acquisition volume. This modulation process is governed by a deep learning model, utilizing prior information. Central to our approach is LUM-ViT, a Vision Transformer variant. Uniquely, LUM-ViT incorporates a learnable under-sampling mask tailored for pre-acquisition modulation. To further optimize for optical calculations, we propose a kernel-level weight binarization technique and a three-stage fine-tuning strategy. Our evaluations reveal that, by sampling a mere 10% of the original image pixels, LUM-ViT maintains the accuracy loss within 1.8% on the ImageNet classification task. The method sustains near-original accuracy when implemented on real-world optical hardware, demonstrating its practicality. Code will be available at https://github.com/MaxLLF/LUM-ViT.

View on arXiv PDF Code

Similar