KU-ISPL Speaker Recognition Systems under Language mismatch condition for NIST 2016 Speaker Recognition Evaluation
This work addresses language mismatch in speaker recognition systems, which is a domain-specific problem for speech technology, and is incremental as it applies existing techniques to new data.
The paper tackled speaker recognition under language mismatch conditions, where evaluation data in Tagalog and Cantonese differed from English training data, and developed methods like unsupervised language clustering to address this, achieving competitive performance in the NIST 2016 evaluation.
Korea University Intelligent Signal Processing Lab. (KU-ISPL) developed speaker recognition system for SRE16 fixed training condition. Data for evaluation trials are collected from outside North America, spoken in Tagalog and Cantonese while training data only is spoken English. Thus, main issue for SRE16 is compensating the discrepancy between different languages. As development dataset which is spoken in Cebuano and Mandarin, we could prepare the evaluation trials through preliminary experiments to compensate the language mismatched condition. Our team developed 4 different approaches to extract i-vectors and applied state-of-the-art techniques as backend. To compensate language mismatch, we investigated and endeavored unique method such as unsupervised language clustering, inter language variability compensation and gender/language dependent score normalization.