LGAIFeb 26

Autoregressive Visual Decoding from EEG Signals

arXiv:2602.22555v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work provides a more efficient and interpretable method for visual decoding from EEG signals, which is significant for researchers and developers working on practical brain-computer interface (BCI) applications.

This paper introduces AVDE, a lightweight framework for decoding visual information from EEG signals. It addresses the modality gap between EEG and image data by aligning representations via contrastive learning and then using an autoregressive generative model to predict multi-scale image tokens. AVDE outperforms previous state-of-the-art methods in image retrieval and reconstruction tasks on two datasets, achieving this with only 10% of the parameters.

Electroencephalogram (EEG) signals have become a popular medium for decoding visual information due to their cost-effectiveness and high temporal resolution. However, current approaches face significant challenges in bridging the modality gap between EEG and image data. These methods typically rely on complex adaptation processes involving multiple stages, making it hard to maintain consistency and manage compounding errors. Furthermore, the computational overhead imposed by large-scale diffusion models limit their practicality in real-world brain-computer interface (BCI) applications. In this work, we present AVDE, a lightweight and efficient framework for visual decoding from EEG signals. First, we leverage LaBraM, a pre-trained EEG model, and fine-tune it via contrastive learning to align EEG and image representations. Second, we adopt an autoregressive generative framework based on a "next-scale prediction" strategy: images are encoded into multi-scale token maps using a pre-trained VQ-VAE, and a transformer is trained to autoregressively predict finer-scale tokens starting from EEG embeddings as the coarsest representation. This design enables coherent generation while preserving a direct connection between the input EEG signals and the reconstructed images. Experiments on two datasets show that AVDE outperforms previous state-of-the-art methods in both image retrieval and reconstruction tasks, while using only 10% of the parameters. In addition, visualization of intermediate outputs shows that the generative process of AVDE reflects the hierarchical nature of human visual perception. These results highlight the potential of autoregressive models as efficient and interpretable tools for practical BCI applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes