CLASAug 1, 2024

Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation

arXiv:2408.00205v19 citationsh-index: 41
Originality Incremental advance
AI Analysis

This addresses the problem of generating concise text summaries from speech in real-time for applications like meeting transcription, but it is incremental as it builds on existing ASR and summarization methods.

The paper tackles sentence-wise speech summarization by introducing a new task and datasets, and proposes an end-to-end model with knowledge distillation that improves performance, achieving gains like a 2.1 ROUGE-1 increase on one dataset.

This paper introduces a novel approach called sentence-wise speech summarization (Sen-SSum), which generates text summaries from a spoken document in a sentence-by-sentence manner. Sen-SSum combines the real-time processing of automatic speech recognition (ASR) with the conciseness of speech summarization. To explore this approach, we present two datasets for Sen-SSum: Mega-SSum and CSJ-SSum. Using these datasets, our study evaluates two types of Transformer-based models: 1) cascade models that combine ASR and strong text summarization models, and 2) end-to-end (E2E) models that directly convert speech into a text summary. While E2E models are appealing to develop compute-efficient models, they perform worse than cascade models. Therefore, we propose knowledge distillation for E2E models using pseudo-summaries generated by the cascade models. Our experiments show that this proposed knowledge distillation effectively improves the performance of the E2E model on both datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes