CLAug 24, 2020

A Baseline Analysis for Podcast Abstractive Summarization

Chujie Zheng, Harry Jiannan Wang, Kunpeng Zhang, Ling Fan

arXiv:2008.10648v20.52 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses podcast summarization for recommendation systems and downstream applications, but it is incremental as it only provides a baseline analysis without proposing new methods.

This paper tackles the challenge of automatic podcast summarization by conducting a baseline analysis using the Spotify Podcast Dataset from TREC 2020, finding that existing models struggle with podcasts due to their longer, more conversational, and noisier nature compared to news texts.

Podcast summary, an important factor affecting end-users' listening decisions, has often been considered a critical feature in podcast recommendation systems, as well as many downstream applications. Existing abstractive summarization approaches are mainly built on fine-tuned models on professionally edited texts such as CNN and DailyMail news. Different from news, podcasts are often longer, more colloquial and conversational, and noisier with contents on commercials and sponsorship, which makes automatic podcast summarization extremely challenging. This paper presents a baseline analysis of podcast summarization using the Spotify Podcast Dataset provided by TREC 2020. It aims to help researchers understand current state-of-the-art pre-trained models and hence build a foundation for creating better models.

View on arXiv PDF Code

Similar