Multilayer bootstrap network for unsupervised speaker recognition
This work addresses speaker recognition without labeled data, but it is incremental as it applies an existing unsupervised method to a specific domain.
The paper tackles unsupervised speaker recognition by applying a multilayer bootstrap network to reduce the dimensionality of supervectors extracted from an unsupervised universal background model, then clustering the low-dimensional data, achieving effectiveness and robustness compared to existing methods.
We apply multilayer bootstrap network (MBN), a recent proposed unsupervised learning method, to unsupervised speaker recognition. The proposed method first extracts supervectors from an unsupervised universal background model, then reduces the dimension of the high-dimensional supervectors by multilayer bootstrap network, and finally conducts unsupervised speaker recognition by clustering the low-dimensional data. The comparison results with 2 unsupervised and 1 supervised speaker recognition techniques demonstrate the effectiveness and robustness of the proposed method.