SDASDec 23, 2020

CN-Celeb: multi-genre speaker recognition

arXiv:2012.12468v2157 citations
AI Analysis

This work provides a crucial large-scale, in-the-wild multi-genre dataset for researchers tackling the challenging problem of genre mismatch in speaker recognition, which is an incremental step towards more robust systems.

This paper introduces CN-Celeb, a large-scale dataset of 3,000 speakers across 11 diverse genres, collected in the wild to address the challenge of genre mismatch in speaker recognition. The authors use this dataset to study the impact of multi-genre conditions on speaker recognition performance and demonstrate the benefits of multi-genre training.

Research on speaker recognition is extending to address the vulnerability in the wild conditions, among which genre mismatch is perhaps the most challenging, for instance, enrollment with reading speech while testing with conversational or singing audio. This mismatch leads to complex and composite inter-session variations, both intrinsic (i.e., speaking style, physiological status) and extrinsic (i.e., recording device, background noise). Unfortunately, the few existing multi-genre corpora are not only limited in size but are also recorded under controlled conditions, which cannot support conclusive research on the multi-genre problem. In this work, we firstly publish CN-Celeb, a large-scale multi-genre corpus that includes in-the-wild speech utterances of 3,000 speakers in 11 different genres. Secondly, using this dataset, we conduct a comprehensive study on the multi-genre phenomenon, in particular the impact of the multi-genre challenge on speaker recognition and the performance gain when the new dataset is used to conduct multi-genre training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes