CLMar 20, 2025

SPACER: A Parallel Dataset of Speech Production And Comprehension of Error Repairs

Shiva Upadhye, Jiaxuan Li, Richard Futrell

arXiv:2503.16745v116.311 citationsh-index: 5Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Originality Synthesis-oriented

AI Analysis

This provides a dataset for integrated research on language production and comprehension, but it is incremental as it builds on prior work by enabling parallel analysis without introducing new methods.

The study tackled the lack of parallel data for investigating error correction in speech by introducing SPACER, a dataset from the Switchboard corpus that captures how speakers and comprehenders correct naturalistic single-word substitution errors, revealing asymmetries in correction strategies based on semantic and phonemic factors.

Speech errors are a natural part of communication, yet they rarely lead to complete communicative failure because both speakers and comprehenders can detect and correct errors. Although prior research has examined error monitoring and correction in production and comprehension separately, integrated investigation of both systems has been impeded by the scarcity of parallel data. In this study, we present SPACER, a parallel dataset that captures how naturalistic speech errors are corrected by both speakers and comprehenders. We focus on single-word substitution errors extracted from the Switchboard corpus, accompanied by speaker's self-repairs and comprehenders' responses from an offline text-editing experiment. Our exploratory analysis suggests asymmetries in error correction strategies: speakers are more likely to repair errors that introduce greater semantic and phonemic deviations, whereas comprehenders tend to correct errors that are phonemically similar to more plausible alternatives or do not fit into prior contexts. Our dataset enables future research on integrated approaches toward studying language production and comprehension.

View on arXiv PDF

Similar