IV CV LG NCDec 17, 2024

Optimized two-stage AI-based Neural Decoding for Enhanced Visual Stimulus Reconstruction from fMRI Data

Lorenzo Veronese, Andrea Moglia, Luca Mainardi, Pietro Cerveri

arXiv:2412.13237v13.61 citationsh-index: 4J Neural Eng

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing visual stimulus reconstruction from fMRI data for neuroscience and brain-computer interface applications, representing an incremental improvement over existing two-stage AI-based methods.

The paper tackled the problem of reconstructing visual stimuli from noisy fMRI data by proposing a non-linear deep network to improve latent space representation, achieving a 2% increase in structural similarity and a 4% improvement in semantic accuracy compared to state-of-the-art methods.

AI-based neural decoding reconstructs visual perception by leveraging generative models to map brain activity, measured through functional MRI (fMRI), into latent hierarchical representations. Traditionally, ridge linear models transform fMRI into a latent space, which is then decoded using latent diffusion models (LDM) via a pre-trained variational autoencoder (VAE). Due to the complexity and noisiness of fMRI data, newer approaches split the reconstruction into two sequential steps, the first one providing a rough visual approximation, the second on improving the stimulus prediction via LDM endowed by CLIP embeddings. This work proposes a non-linear deep network to improve fMRI latent space representation, optimizing the dimensionality alike. Experiments on the Natural Scenes Dataset showed that the proposed architecture improved the structural similarity of the reconstructed image by about 2\% with respect to the state-of-the-art model, based on ridge linear transform. The reconstructed image's semantics improved by about 4\%, measured by perceptual similarity, with respect to the state-of-the-art. The noise sensitivity analysis of the LDM showed that the role of the first stage was fundamental to predict the stimulus featuring high structural similarity. Conversely, providing a large noise stimulus affected less the semantics of the predicted stimulus, while the structural similarity between the ground truth and predicted stimulus was very poor. The findings underscore the importance of leveraging non-linear relationships between BOLD signal and the latent representation and two-stage generative AI for optimizing the fidelity of reconstructed visual stimuli from noisy fMRI data.

View on arXiv PDF

Similar