CVAINCMar 9, 2023

Natural scene reconstruction from fMRI signals using generative latent diffusion

arXiv:2303.05334v2180 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the challenge of reconstructing complex natural images from brain activity, which could impact brain-computer interfaces and neuroscience, though it builds incrementally on existing generative AI methods.

The paper tackles the problem of reconstructing natural scenes from fMRI signals by introducing a two-stage framework called Brain-Diffuser, which combines a VDVAE for low-level properties and a latent diffusion model for high-level features, achieving superior performance on the Natural Scenes Dataset benchmark.

In neural decoding research, one of the most intriguing topics is the reconstruction of perceived natural images based on fMRI signals. Previous studies have succeeded in re-creating different aspects of the visuals, such as low-level properties (shape, texture, layout) or high-level features (category of objects, descriptive semantics of scenes) but have typically failed to reconstruct these properties together for complex scene images. Generative AI has recently made a leap forward with latent diffusion models capable of generating high-complexity images. Here, we investigate how to take advantage of this innovative technology for brain decoding. We present a two-stage scene reconstruction framework called ``Brain-Diffuser''. In the first stage, starting from fMRI signals, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Variational Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model (Versatile Diffusion) conditioned on predicted multimodal (text and visual) features, to generate final reconstructed images. On the publicly available Natural Scenes Dataset benchmark, our method outperforms previous models both qualitatively and quantitatively. When applied to synthetic fMRI patterns generated from individual ROI (region-of-interest) masks, our trained model creates compelling ``ROI-optimal'' scenes consistent with neuroscientific knowledge. Thus, the proposed methodology can have an impact on both applied (e.g. brain-computer interface) and fundamental neuroscience.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes