CLAIAug 21, 2021

Hierarchical Summarization for Longform Spoken Dialog

arXiv:2108.09597v127 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of navigating unstructured spoken dialog for users, but it is incremental as it builds on existing ASR and summarization methods.

The authors tackled the problem of poor automated understanding and information extraction from spoken dialog by designing a two-stage ASR and text summarization pipeline with semantic segmentation and merging algorithms, resulting in a system that users preferred for quickly skimming audio and identifying content of interest.

Every day we are surrounded by spoken dialog. This medium delivers rich diverse streams of information auditorily; however, systematically understanding dialog can often be non-trivial. Despite the pervasiveness of spoken dialog, automated speech understanding and quality information extraction remains markedly poor, especially when compared to written prose. Furthermore, compared to understanding text, auditory communication poses many additional challenges such as speaker disfluencies, informal prose styles, and lack of structure. These concerns all demonstrate the need for a distinctly speech tailored interactive system to help users understand and navigate the spoken language domain. While individual automatic speech recognition (ASR) and text summarization methods already exist, they are imperfect technologies; neither consider user purpose and intent nor address spoken language induced complications. Consequently, we design a two stage ASR and text summarization pipeline and propose a set of semantic segmentation and merging algorithms to resolve these speech modeling challenges. Our system enables users to easily browse and navigate content as well as recover from errors in these underlying technologies. Finally, we present an evaluation of the system which highlights user preference for hierarchical summarization as a tool to quickly skim audio and identify content of interest to the user.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes