CV AISep 27, 2023

MindGPT: Interpreting What You See with Non-invasive Brain Recordings

Jiaxuan Chen, Yu Qi, Yueming Wang, Gang Pan

arXiv:2309.15729v15.919 citationsh-index: 48Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of interpreting brain signals for visual content decoding, which has applications in neuroscience and brain-computer interfaces, though it appears incremental as it builds on existing methods with a novel hybrid approach.

The paper tackles the problem of decoding visual contents from non-invasive brain recordings (fMRI) by introducing MindGPT, which interprets perceived visual stimuli into natural language sequences, achieving results that truthfully represent visual information with essential details.

Decoding of seen visual contents with non-invasive brain recordings has important scientific and practical values. Efforts have been made to recover the seen images from brain signals. However, most existing approaches cannot faithfully reflect the visual contents due to insufficient image quality or semantic mismatches. Compared with reconstructing pixel-level visual images, speaking is a more efficient and effective way to explain visual information. Here we introduce a non-invasive neural decoder, termed as MindGPT, which interprets perceived visual stimuli into natural languages from fMRI signals. Specifically, our model builds upon a visually guided neural encoder with a cross-attention mechanism, which permits us to guide latent neural representations towards a desired language semantic direction in an end-to-end manner by the collaborative use of the large language model GPT. By doing so, we found that the neural representations of the MindGPT are explainable, which can be used to evaluate the contributions of visual properties to language semantics. Our experiments show that the generated word sequences truthfully represented the visual information (with essential details) conveyed in the seen stimuli. The results also suggested that with respect to language decoding tasks, the higher visual cortex (HVC) is more semantically informative than the lower visual cortex (LVC), and using only the HVC can recover most of the semantic information. The code of the MindGPT model will be publicly available at https://github.com/JxuanC/MindGPT.

View on arXiv PDF Code

Similar