Naturalistic Music Decoding from EEG Data via Latent Diffusion Models
This work addresses the challenge of decoding complex auditory information from non-invasive EEG data for brain-computer interface applications, representing an incremental step in neural decoding research.
The study tackled the problem of reconstructing naturalistic music from EEG data using latent diffusion models, achieving an initial foray into high-quality music reconstruction with quantitative evaluation on the NMED-T dataset.
In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.