Automatic Quality Assessment for Audio-Visual Verification Systems. The LOVe submission to NIST SRE Challenge 2019
This work addresses quality-dependent fusion for speaker-face verification systems, which is incremental as it builds upon existing multimodal biometric techniques.
The paper tackled the problem of improving multimodal biometric verification by proposing a universal model for automatic quality assessment of both face and speaker modalities, which enhanced score-level fusion and demonstrated improvements on the NIST SRE19 Audio-Visual Challenge dataset.
Fusion of scores is a cornerstone of multimodal biometric systems composed of independent unimodal parts. In this work, we focus on quality-dependent fusion for speaker-face verification. To this end, we propose a universal model which can be trained for automatic quality assessment of both face and speaker modalities. This model estimates the quality of representations produced by unimodal systems which are then used to enhance the score-level fusion of speaker and face verification modules. We demonstrate the improvements brought by this quality-dependent fusion on the recent NIST SRE19 Audio-Visual Challenge dataset.