CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
This addresses a bottleneck for AI applications in spectral imaging across domains like medicine and remote sensing, offering improved generalizability over camera-specific models.
The paper tackles the problem of camera-specific models in spectral imaging due to variability in channel dimensionality and wavelengths, introducing CARL for camera-agnostic representation learning across RGB, multispectral, and hyperspectral modalities, with experiments showing robustness to spectral heterogeneity in medical imaging, autonomous driving, and satellite imaging domains.
Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding, and is already established as a critical modality in remote sensing. However, variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies, leading to camera-specific models with limited generalizability and inadequate cross-camera applicability. To address this bottleneck, we introduce CARL, a model for Camera-Agnostic Representation Learning across RGB, multispectral, and hyperspectral imaging modalities. To enable the conversion of a spectral image with any channel dimensionality to a camera-agnostic representation, we introduce a novel spectral encoder, featuring a self-attention-cross-attention mechanism, to distill salient spectral information into learned spectral representations. Spatio-spectral pre-training is achieved with a novel feature-based self-supervision strategy tailored to CARL. Large-scale experiments across the domains of medical imaging, autonomous driving, and satellite imaging demonstrate our model's unique robustness to spectral heterogeneity, outperforming on datasets with simulated and real-world cross-camera spectral variations. The scalability and versatility of the proposed approach position our model as a backbone for future spectral foundation models.