CLSDASJun 20, 2024

System Description for the Displace Speaker Diarization Challenge 2023

arXiv:2406.15516v1
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for speaker diarization in conversational environments, addressing a specific challenge dataset.

The paper describes a speaker diarization system for the Displace 2023 challenge, combining VAD, ResNet-based CNN feature extraction, and spectral clustering, achieving DER scores of 27.1% and 27.4% on development and evaluation datasets without Hindi training.

This paper describes our solution for the Diarization of Speaker and Language in Conversational Environments Challenge (Displace 2023). We used a combination of VAD for finding segfments with speech, Resnet architecture based CNN for feature extraction from these segments, and spectral clustering for features clustering. Even though it was not trained with using Hindi, the described algorithm achieves the following metrics: DER 27. 1% and DER 27. 4%, on the development and phase-1 evaluation parts of the dataset, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes