IV CVSep 24, 2022

S^2-Transformer for Mask-Aware Hyperspectral Image Reconstruction

Jiamian Wang, Kunpeng Li, Yulun Zhang, Xin Yuan, Zhiqiang Tao

arXiv:2209.12075v32.7Has Code

Originality Incremental advance

AI Analysis

This work improves hyperspectral imaging for applications like remote sensing or medical diagnostics, but it is incremental as it builds on existing Transformer-based methods.

The paper tackled the problem of hyperspectral image reconstruction in snapshot compressive imaging by addressing limitations in spatial and spectral attention designs and masked data loss, resulting in a method that outperforms state-of-the-art approaches on average.

Snapshot compressive imaging (SCI) surges as a novel way of capturing hyperspectral images. It operates an optical encoder to compress the 3D data into a 2D measurement and adopts a software decoder for the signal reconstruction. Recently, a representative SCI set-up of coded aperture snapshot compressive imager (CASSI) with Transformer reconstruction backend remarks high-fidelity sensing performance. However, dominant spatial and spectral attention designs show limitations in hyperspectral modeling. The spatial attention values describe the inter-pixel correlation but overlook the across-spectra variation within each pixel. The spectral attention size is unscalable to the token spatial size and thus bottlenecks information allocation. Besides, CASSI entangles the spatial and spectral information into a 2D measurement, placing a barrier for information disentanglement and modeling. In addition, CASSI blocks the light with a physical binary mask, yielding the masked data loss. To tackle above challenges, we propose a spatial-spectral (S2-) Transformer implemented by a paralleled attention design and a mask-aware learning strategy. Firstly, we systematically explore pros and cons of different spatial (-spectral) attention designs, based on which we find performing both attentions in parallel well disentangles and models the blended information. Secondly, the masked pixels induce higher prediction difficulty and should be treated differently from unmasked ones. We adaptively prioritize the loss penalty attributing to the mask structure by referring to the mask-encoded prediction as an uncertainty estimator. We theoretically discuss the distinct convergence tendencies between masked/unmasked regions of the proposed learning strategy. Extensive experiments demonstrate that on average, the results of the proposed method are superior over the state-of-the-art method.

View on arXiv PDF Code

Similar