A discriminative condition-aware backend for speaker verification
This work addresses speaker verification for cases where test conditions are unknown and no domain-specific calibration data is available, representing an incremental improvement over standard PLDA-based backends.
The paper tackles the problem of speaker verification by proposing a discriminative condition-aware backend that jointly trains all parameters to optimize binary cross-entropy and integrates calibration based on signal conditions. The result is excellent out-of-the-box calibration performance on most test sets, making it suitable for unknown test conditions without development data.
We present a scoring approach for speaker verification that mimics the standard PLDA-based backend process used in most current speaker verification systems. However, unlike the standard backends, all parameters of the model are jointly trained to optimize the binary cross-entropy for the speaker verification task. We further integrate the calibration stage inside the model, making the parameters of this stage depend on metadata vectors that represent the conditions of the signals. We show that the proposed backend has excellent out-of-the-box calibration performance on most of our test sets, making it an ideal approach for cases in which the test conditions are not known and development data is not available for training a domain-specific calibration model.