ASSDAug 4, 2020

MIRNet: Learning multiple identities representations in overlapped speech

arXiv:2008.01698v29 citations
AI Analysis

This addresses the challenge of speaker identification in multi-speaker scenarios for applications like audio processing and security.

The paper tackles the problem of extracting multiple speaker identities from overlapped speech by proposing a novel deep representation strategy, achieving effectiveness in speaker verification and speech separation tasks.

Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determine identity information when there are multiple concurrent speakers in a given signal. In this paper, we propose a novel deep speaker representation strategy that can reliably extract multiple speaker identities from an overlapped speech. We design a network that can extract a high-level embedding that contains information about each speaker's identity from a given mixture. Unlike conventional approaches that need reference acoustic features for training, our proposed algorithm only requires the speaker identity labels of the overlapped speech segments. We demonstrate the effectiveness and usefulness of our algorithm in a speaker verification task and a speech separation system conditioned on the target speaker embeddings obtained through the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes