SDLGASFeb 23, 2022

Speaker recognition improvement using blind inversion of distortions

arXiv:2203.01164v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the mismatch between controlled training and distorted testing in speaker recognition systems, offering an incremental improvement for audio processing applications.

The paper tackled the problem of speaker recognition performance degradation due to nonlinear distortions like saturations in test signals, achieving an improvement from 80% to 88.57% recognition rates with saturated speech by combining data fusion and distortion compensation.

In this paper we propose the inversion of nonlinear distortions in order to improve the recognition rates of a speaker recognizer system. We study the effect of saturations on the test signals, trying to take into account real situations where the training material has been recorded in a controlled situation but the testing signals present some mismatch with the input signal level (saturations). The experimental results shows that a combination of data fusion with and without nonlinear distortion compensation can improve the recognition rates with saturated test sentences from 80% to 88.57%, while the results with clean speech (without saturation) is 87.76% for one microphone.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes