LGSDSPDec 17, 2025

O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization

arXiv:2512.15229v1ICASSP
Originality Incremental advance
AI Analysis

This provides a more efficient solution for real-time speaker diarization in applications like conversational telephone speech, though it is incremental compared to prior end-to-end approaches.

The paper tackles speaker diarization by developing O-EENC-SD, an online end-to-end system that improves efficiency over existing methods while maintaining competitive performance, achieving a good trade-off between diarization error rate and complexity on the CallHome dataset.

We introduce O-EENC-SD: an end-to-end online speaker diarization system based on EEND-EDA, featuring a novel RNN-based stitching mechanism for online prediction. In particular, we develop a novel centroid refinement decoder whose usefulness is assessed through a rigorous ablation study. Our system provides key advantages over existing methods: a hyperparameter-free solution compared to unsupervised clustering approaches, and a more efficient alternative to current online end-to-end methods, which are computationally costly. We demonstrate that O-EENC-SD is competitive with the state of the art in the two-speaker conversational telephone speech domain, as tested on the CallHome dataset. Our results show that O-EENC-SD provides a great trade-off between DER and complexity, even when working on independent chunks with no overlap, making the system extremely efficient.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes