CVAIOct 3, 2025

HAVIR: HierArchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion

arXiv:2510.03122v21 citationsh-index: 9BIBM
Originality Incremental advance
AI Analysis

This work addresses the problem of accurately recovering complex visual stimuli from brain activity for neuroscience and computer vision integration, representing an incremental improvement over prior methods.

The paper tackles the challenge of reconstructing visual information from brain activity by proposing HAVIR, a model that separates the visual cortex into hierarchical regions to extract structural and semantic features, which are integrated using Versatile Diffusion to synthesize images, resulting in enhanced structural and semantic quality in reconstructions compared to existing models.

The reconstruction of visual information from brain activity fosters interdisciplinary integration between neuroscience and computer vision. However, existing methods still face challenges in accurately recovering highly complex visual stimuli. This difficulty stems from the characteristics of natural scenes: low-level features exhibit heterogeneity, while high-level features show semantic entanglement due to contextual overlaps. Inspired by the hierarchical representation theory of the visual cortex, we propose the HAVIR model, which separates the visual cortex into two hierarchical regions and extracts distinct features from each. Specifically, the Structural Generator extracts structural information from spatial processing voxels and converts it into latent diffusion priors, while the Semantic Extractor converts semantic processing voxels into CLIP embeddings. These components are integrated via the Versatile Diffusion model to synthesize the final image. Experimental results demonstrate that HAVIR enhances both the structural and semantic quality of reconstructions, even in complex scenes, and outperforms existing models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes