Looking through the mind's eye via multimodal encoder-decoder networks
This work addresses the problem of interpreting brain activity for mental imagery decoding, which could benefit neuroscience and brain-computer interfaces, but it is incremental as it builds on existing fMRI-to-video mapping methods.
The authors tackled decoding mental imagery from fMRI measurements by creating a mapping between fMRI signals and visual imagery, then aligning latent representations based on textual prompts to decode visual content; they demonstrated efficacy on an augmented dataset of eight subjects, achieving plausible decoding results.
In this work, we explore the decoding of mental imagery from subjects using their fMRI measurements. In order to achieve this decoding, we first created a mapping between a subject's fMRI signals elicited by the videos the subjects watched. This mapping associates the high dimensional fMRI activation states with visual imagery. Next, we prompted the subjects textually, primarily with emotion labels which had no direct reference to visual objects. Then to decode visual imagery that may have been in a person's mind's eye, we align a latent representation of these fMRI measurements with a corresponding video-fMRI based on textual labels given to the videos themselves. This alignment has the effect of overlapping the video fMRI embedding with the text-prompted fMRI embedding, thus allowing us to use our fMRI-to-video mapping to decode. Additionally, we enhance an existing fMRI dataset, initially consisting of data from five subjects, by including recordings from three more subjects gathered by our team. We demonstrate the efficacy of our model on this augmented dataset both in accurately creating a mapping, as well as in plausibly decoding mental imagery.