Modeling Language Usage and Listener Engagement in Podcasts
This addresses the need for data-driven insights into podcast creation for creators and researchers, though it is incremental as it builds on existing wisdom with new analysis.
The paper tackled the problem of understanding how linguistic style correlates with listener engagement in podcasts by analyzing vocabulary diversity, distinctiveness, emotion, and syntax from descriptions and transcripts, and showed that these features are highly predictive of engagement.
While there is an abundance of popular writing targeted to podcast creators on how to speak in ways that engage their listeners, there has been little data-driven analysis of podcasts that relates linguistic style with listener engagement. In this paper, we investigate how various factors -- vocabulary diversity, distinctiveness, emotion, and syntax, among others -- correlate with engagement, based on analysis of the creators' written descriptions and transcripts of the audio. We build models with different textual representations, and show that the identified features are highly predictive of engagement. Our analysis tests popular wisdom about stylistic elements in high-engagement podcasts, corroborating some aspects, and adding new perspectives on others.