ASLGSDFeb 21, 2019

All-neural online source separation, counting, and diarization for meeting analysis

arXiv:1902.07881v147 citations
Originality Highly original
AI Analysis

This addresses the need for integrated, real-time processing in meeting analysis, offering a novel solution that combines multiple tasks, though it is incremental in building on existing neural methods.

The paper tackled the problem of simultaneous speaker counting, diarization, and source separation in meeting analysis by developing an all-neural block-online approach, achieving state-of-the-art separation performance and good results in diarization and counting, with generalization to unseen large numbers of blocks.

Automatic meeting analysis comprises the tasks of speaker counting, speaker diarization, and the separation of overlapped speech, followed by automatic speech recognition. This all has to be carried out on arbitrarily long sessions and, ideally, in an online or block-online manner. While significant progress has been made on individual tasks, this paper presents for the first time an all-neural approach to simultaneous speaker counting, diarization and source separation. The NN-based estimator operates in a block-online fashion and tracks speakers even if they remain silent for a number of time blocks, thus learning a stable output order for the separated sources. The neural network is recurrent over time as well as over the number of sources. The simulation experiments show that state of the art separation performance is achieved, while at the same time delivering good diarization and source counting results. It even generalizes well to an unseen large number of blocks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes