CL SD AS NCJun 4, 2025

Brain-tuned Speech Models Better Reflect Speech Processing Stages in the Brain

CMU

arXiv:2506.03832v16.73 citationsh-index: 15INTERSPEECH

Originality Incremental advance

AI Analysis

This addresses the need for better model organisms in neuroscience and AI for understanding human speech processing, though it is incremental as it builds on prior brain-tuning work.

The study tackled the problem that pretrained self-supervised speech models do not reflect the hierarchy of human speech processing, finding that brain-tuned models improve alignment with semantic brain regions and exhibit a clear acoustic-to-semantic hierarchy.

Pretrained self-supervised speech models excel in speech tasks but do not reflect the hierarchy of human speech processing, as they encode rich semantics in middle layers and poor semantics in late layers. Recent work showed that brain-tuning (fine-tuning models using human brain recordings) improves speech models' semantic understanding. Here, we examine how well brain-tuned models further reflect the brain's intermediate stages of speech processing. We find that late layers of brain-tuned models substantially improve over pretrained models in their alignment with semantic language regions. Further layer-wise probing reveals that early layers remain dedicated to low-level acoustic features, while late layers become the best at complex high-level tasks. These findings show that brain-tuned models not only perform better but also exhibit a well-defined hierarchical processing going from acoustic to semantic representations, making them better model organisms for human speech processing.

View on arXiv PDF

Similar