CLSDASJun 19, 2024

Children's Speech Recognition through Discrete Token Enhancement

arXiv:2406.13431v25 citations
Originality Incremental advance
AI Analysis

This addresses data scarcity and privacy issues in children's speech recognition, though it is incremental as it builds on existing tokenization methods.

The study tackled children's speech recognition by integrating discrete tokens to address privacy concerns, achieving nearly equivalent performance with an approximate 83% reduction in parameters.

Children's speech recognition is considered a low-resource task mainly due to the lack of publicly available data. There are several reasons for such data scarcity, including expensive data collection and annotation processes, and data privacy, among others. Transforming speech signals into discrete tokens that do not carry sensitive information but capture both linguistic and acoustic information could be a solution for privacy concerns. In this study, we investigate the integration of discrete speech tokens into children's speech recognition systems as input without significantly degrading the ASR performance. Additionally, we explored single-view and multi-view strategies for creating these discrete labels. Furthermore, we tested the models for generalization capabilities with unseen domain and nativity dataset. Results reveal that the discrete token ASR for children achieves nearly equivalent performance with an approximate 83% reduction in parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes