CLSDASApr 10, 2025

Summarizing Speech: A Comprehensive Survey

arXiv:2504.08024v36 citationsh-index: 4EMNLP
Originality Synthesis-oriented
AI Analysis

This is an incremental survey paper that synthesizes existing knowledge in speech summarization for researchers and practitioners.

This survey examines the field of speech summarization, analyzing existing datasets, evaluation protocols, and recent developments including fine-tuned cascaded architectures and end-to-end solutions, while highlighting ongoing challenges like the need for better benchmarks and multilingual datasets.

Speech summarization has become an essential tool for efficiently managing and accessing the growing volume of spoken and audiovisual content. However, despite its increasing importance, speech summarization remains loosely defined. The field intersects with several research areas, including speech recognition, text summarization, and specific applications like meeting summarization. This survey not only examines existing datasets and evaluation protocols, which are crucial for assessing the quality of summarization approaches, but also synthesizes recent developments in the field, highlighting the shift from traditional systems to advanced models like fine-tuned cascaded architectures and end-to-end solutions. In doing so, we surface the ongoing challenges, such as the need for realistic evaluation benchmarks, multilingual datasets, and long-context handling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes