CL SD ASJul 17, 2023

BASS: Block-wise Adaptation for Speech Summarization

Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj

CMUMeta AI

arXiv:2307.08217v11.78 citationsh-index: 83

Originality Incremental advance

AI Analysis

This addresses the challenge of handling very long audio inputs in speech summarization, offering a practical solution for applications like transcription or content analysis, though it is incremental in nature.

The paper tackles the problem of training end-to-end speech summarization models on very long sequences by proposing a block-wise adaptation method, which improves performance by 3 points absolute on ROUGE-L over a truncated input baseline.

End-to-end speech summarization has been shown to improve performance over cascade baselines. However, such models are difficult to train on very large inputs (dozens of minutes or hours) owing to compute restrictions and are hence trained with truncated model inputs. Truncation leads to poorer models, and a solution to this problem rests in block-wise modeling, i.e., processing a portion of the input frames at a time. In this paper, we develop a method that allows one to train summarization models on very long sequences in an incremental manner. Speech summarization is realized as a streaming process, where hypothesis summaries are updated every block based on new acoustic information. We devise and test strategies to pass semantic context across the blocks. Experiments on the How2 dataset demonstrate that the proposed block-wise training method improves by 3 points absolute on ROUGE-L over a truncated input baseline.

View on arXiv PDF

Similar