CVLGIVApr 6, 2021

Fourier Image Transformer

arXiv:2104.02555v326 citations
AI Analysis

This addresses image reconstruction and analysis problems in domains like medical imaging, offering a novel approach in Fourier space.

The paper tackles image analysis tasks by proposing a sequential image representation using Fourier Domain Encodings, enabling auto-regressive image completion and querying of Fourier coefficients, with demonstration in CT image reconstruction.

Transformer architectures show spectacular performance on NLP tasks and have recently also been used for tasks such as image completion or image classification. Here we propose to use a sequential image representation, where each prefix of the complete sequence describes the whole image at reduced resolution. Using such Fourier Domain Encodings (FDEs), an auto-regressive image completion task is equivalent to predicting a higher resolution output given a low-resolution input. Additionally, we show that an encoder-decoder setup can be used to query arbitrary Fourier coefficients given a set of Fourier domain observations. We demonstrate the practicality of this approach in the context of computed tomography (CT) image reconstruction. In summary, we show that Fourier Image Transformer (FIT) can be used to solve relevant image analysis tasks in Fourier space, a domain inherently inaccessible to convolutional architectures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes