SDAIASNov 15, 2022

Rapid Connectionist Speaker Adaptation

arXiv:2211.08978v14 citationsh-index: 31
Originality Incremental advance
AI Analysis

This addresses speaker adaptation for speech recognition systems, but it appears incremental as it builds on existing methods like MS-TDNN.

The paper tackles speaker variability in speech recognition by introducing SVCnet, which generates a Speaker Voice Code from a brief voice sample to adapt recognition systems without retraining, achieving unspecified performance gains.

We present SVCnet, a system for modelling speaker variability. Encoder Neural Networks specialized for each speech sound produce low dimensionality models of acoustical variation, and these models are further combined into an overall model of voice variability. A training procedure is described which minimizes the dependence of this model on which sounds have been uttered. Using the trained model (SVCnet) and a brief, unconstrained sample of a new speaker's voice, the system produces a Speaker Voice Code that can be used to adapt a recognition system to the new speaker without retraining. A system which combines SVCnet with an MS-TDNN recognizer is described

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes