AS CL SDSep 13, 2019

Probing the Information Encoded in X-vectors

Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur

arXiv:1909.06351v299 citations

AI Analysis

This work provides insights into the information encoded in speaker embeddings, which is important for researchers in speech processing, but it is incremental as it builds on existing x-vector methods.

The paper investigated what information x-vector speaker embeddings encode, using classifiers to probe for speaker, channel, transcription, and meta details, and found that x-vectors capture spoken content and channel-related information while performing well on speaker verification tasks.

Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks. In this paper, we use simple classifiers to investigate the contents encoded by x-vector embeddings. We probe these embeddings for information related to the speaker, channel, transcription (sentence, words, phones), and meta information about the utterance (duration and augmentation type), and compare these with the information encoded by i-vectors across a varying number of dimensions. We also study the effect of data augmentation during extractor training on the information captured by x-vectors. Experiments on the RedDots data set show that x-vectors capture spoken content and channel-related information, while performing well on speaker verification tasks.

View on arXiv PDF

Similar