Rapid Connectionist Speaker Adaptation
This addresses speaker adaptation for speech recognition systems, but it appears incremental as it builds on existing methods like MS-TDNN.
The paper tackles speaker variability in speech recognition by introducing SVCnet, which generates a Speaker Voice Code from a brief voice sample to adapt recognition systems without retraining, achieving unspecified performance gains.
We present SVCnet, a system for modelling speaker variability. Encoder Neural Networks specialized for each speech sound produce low dimensionality models of acoustical variation, and these models are further combined into an overall model of voice variability. A training procedure is described which minimizes the dependence of this model on which sounds have been uttered. Using the trained model (SVCnet) and a brief, unconstrained sample of a new speaker's voice, the system produces a Speaker Voice Code that can be used to adapt a recognition system to the new speaker without retraining. A system which combines SVCnet with an MS-TDNN recognizer is described