NCHCJun 1

Mapping Whisper Representations to Human ECoG Responses with Interpretable Time-Resolved Neural Encoding

arXiv:2606.0230545.3
AI Analysis

For computational neuroscientists studying cortical speech processing, this work provides a method to link deep speech models to neural data, revealing hierarchical alignment and phoneme-category organization.

The study introduces a time-resolved neural encoder that maps Whisper's internal representations to human ECoG responses during speech perception, finding that intermediate layers align best with cortical activity and that temporal modeling improves predictions over linear baselines.

Understanding how speech foundation models relate to human cortical activity is a key challenge for computational neuroscience. Here, we investigate how internal representations from Whisper predict intracranial ECoG responses during naturalistic speech perception. We introduce a time-resolved neural encoder that combines speech embeddings with a recurrent temporal model and soft attention, allowing us to examine layer-wise brain alignment. Intermediate Whisper layers provide the strongest correspondence with neural activity, supporting a hierarchical match between model representations and cortical speech processing. Comparisons with baselines show that high-resolution ECoG responses benefit from temporally structured modelling beyond linear mappings from the same speech representations. In addition, attention maps reveal temporally local alignment between speech embeddings and neural responses, while a phonemic interpretability analysis identifies anatomically coherent phoneme-category organization among encoding-informative electrodes. Together, these results suggest that speech foundation models offer a useful framework for studying time-resolved cortical speech representations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes