Matteo Ciferri

NC
h-index10
5papers
3citations
Novelty45%
AI Score43

5 Papers

45.3NCJun 1
Mapping Whisper Representations to Human ECoG Responses with Interpretable Time-Resolved Neural Encoding

Matteo Ciferri, Tommaso Boccato, Michal Olak et al.

Understanding how speech foundation models relate to human cortical activity is a key challenge for computational neuroscience. Here, we investigate how internal representations from Whisper predict intracranial ECoG responses during naturalistic speech perception. We introduce a time-resolved neural encoder that combines speech embeddings with a recurrent temporal model and soft attention, allowing us to examine layer-wise brain alignment. Intermediate Whisper layers provide the strongest correspondence with neural activity, supporting a hierarchical match between model representations and cortical speech processing. Comparisons with baselines show that high-resolution ECoG responses benefit from temporally structured modelling beyond linear mappings from the same speech representations. In addition, attention maps reveal temporally local alignment between speech embeddings and neural responses, while a phonemic interpretability analysis identifies anatomically coherent phoneme-category organization among encoding-informative electrodes. Together, these results suggest that speech foundation models offer a useful framework for studying time-resolved cortical speech representations.

34.0NCApr 15
Seeing the imagined: a latent functional alignment in visual imagery decoding from fMRI data

Fabrizio Spera, Tommaso Boccato, Michal Olak et al.

Recent progress in visual brain decoding from fMRI has been enabled by large-scale datasets such as the Natural Scenes Dataset (NSD) and powerful diffusion-based generative models. While current pipelines are primarily optimized for perception, their performance under mental-imagery remains less well understood. In this work, we study how a state-of-the-art (SOTA) perception decoder (DynaDiff) can be adapted to reconstruct imagined content from the Imagery-NSD benchmark. We propose a latent functional alignment approach that maps imagery-evoked activity into the pretrained model's conditioning space, while keeping the remaining components frozen. To mitigate the limited amount of matched imagery-perception supervision, we further introduce a retrieval-based augmentation strategy that selects semantically related NSD perception trials. Across four subjects, latent functional alignment consistently improves high-level semantic reconstruction metrics relative to the frozen pretrained baseline and a voxel-space ridge alignment baseline, and enables above-chance decoding from multiple cortical regions. These results suggest that semantic structure learned from perception can be leveraged to stabilize and improve visual imagery decoding under out-of-distribution conditions.

NCJan 16
Simple Models, Rich Representations: Visual Decoding from Primate Intracortical Neural Signals

Matteo Ciferri, Matteo Ferrante, Nicola Toschi

Understanding how neural activity gives rise to perception is a central challenge in neuroscience. We address the problem of decoding visual information from high-density intracortical recordings in primates, using the THINGS Ventral Stream Spiking Dataset. We systematically evaluate the effects of model architecture, training objectives, and data scaling on decoding performance. Results show that decoding accuracy is mainly driven by modeling temporal dynamics in neural signals, rather than architectural complexity. A simple model combining temporal attention with a shallow MLP achieves up to 70% top-1 image retrieval accuracy, outperforming linear baselines as well as recurrent and convolutional approaches. Scaling analyses reveal predictable diminishing returns with increasing input dimensionality and dataset size. Building on these findings, we design a modular generative decoding pipeline that combines low-resolution latent reconstruction with semantically conditioned diffusion, generating plausible images from 200 ms of brain activity. This framework provides principles for brain-computer interfaces and semantic neural decoding.

NCDec 22, 2024
Bridging Auditory Perception and Language Comprehension through MEG-Driven Encoding Models

Matteo Ciferri, Matteo Ferrante, Nicola Toschi

Understanding the neural mechanisms behind auditory and linguistic processing is key to advancing cognitive neuroscience. In this study, we use Magnetoencephalography (MEG) data to analyze brain responses to spoken language stimuli. We develop two distinct encoding models: an audio-to-MEG encoder, which uses time-frequency decompositions (TFD) and wav2vec2 latent space representations, and a text-to-MEG encoder, which leverages CLIP and GPT-2 embeddings. Both models successfully predict neural activity, demonstrating significant correlations between estimated and observed MEG signals. However, the text-to-MEG model outperforms the audio-based model, achieving higher Pearson Correlation (PC) score. Spatially, we identify that auditory-based embeddings (TFD and wav2vec2) predominantly activate lateral temporal regions, which are responsible for primary auditory processing and the integration of auditory signals. In contrast, textual embeddings (CLIP and GPT-2) primarily engage the frontal cortex, particularly Broca's area, which is associated with higher-order language processing, including semantic integration and language production, especially in the 8-30 Hz frequency range. The strong involvement of these regions suggests that auditory stimuli are processed through more direct sensory pathways, while linguistic information is encoded via networks that integrate meaning and cognitive control. Our results reveal distinct neural pathways for auditory and linguistic information processing, with higher encoding accuracy for text representations in the frontal regions. These insights refine our understanding of the brain's functional architecture in processing auditory and textual information, offering quantitative advancements in the modelling of neural responses to complex language stimuli.

NCJun 21, 2024
R&B -- Rhythm and Brain: Cross-subject Decoding of Music from Human Brain Activity

Matteo Ferrante, Matteo Ciferri, Nicola Toschi

Music is a universal phenomenon that profoundly influences human experiences across cultures. This study investigates whether music can be decoded from human brain activity measured with functional MRI (fMRI) during its perception. Leveraging recent advancements in extensive datasets and pre-trained computational models, we construct mappings between neural data and latent representations of musical stimuli. Our approach integrates functional and anatomical alignment techniques to facilitate cross-subject decoding, addressing the challenges posed by the low temporal resolution and signal-to-noise ratio (SNR) in fMRI data. Starting from the GTZan fMRI dataset, where five participants listened to 540 musical stimuli from 10 different genres while their brain activity was recorded, we used the CLAP (Contrastive Language-Audio Pretraining) model to extract latent representations of the musical stimuli and developed voxel-wise encoding models to identify brain regions responsive to these stimuli. By applying a threshold to the association between predicted and actual brain activity, we identified specific regions of interest (ROIs) which can be interpreted as key players in music processing. Our decoding pipeline, primarily retrieval-based, employs a linear map to project brain activity to the corresponding CLAP features. This enables us to predict and retrieve the musical stimuli most similar to those that originated the fMRI data. Our results demonstrate state-of-the-art identification accuracy, with our methods significantly outperforming existing approaches. Our findings suggest that neural-based music retrieval systems could enable personalized recommendations and therapeutic applications. Future work could use higher temporal resolution neuroimaging and generative models to improve decoding accuracy and explore the neural underpinnings of music perception and emotion.