ASCLSDJul 31, 2020

Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones

arXiv:2007.15868v19 citations
Originality Incremental advance
AI Analysis

This addresses the problem of accurate transcription in real meetings for applications like documentation, with incremental improvements over existing methods.

The paper tackles meeting transcription with asynchronous distributed microphones, achieving a character error rate (CER) of 28.7% with 11 microphones compared to 38.2% for a monaural setup, and 21.8% CER, close to headset-based transcription.

A novel framework for meeting transcription using asynchronous microphones is proposed in this paper. It consists of audio synchronization, speaker diarization, utterance-wise speech enhancement using guided source separation, automatic speech recognition, and duplication reduction. Doing speaker diarization before speech enhancement enables the system to deal with overlapped speech without considering sampling frequency mismatch between microphones. Evaluation on our real meeting datasets showed that our framework achieved a character error rate (CER) of 28.7 % by using 11 distributed microphones, while a monaural microphone placed on the center of the table had a CER of 38.2 %. We also showed that our framework achieved CER of 21.8 %, which is only 2.1 percentage points higher than the CER in headset microphone-based transcription.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes