Towards Modelling Coherence in Spoken Discourse
This work is significant for researchers in natural language processing and speech technology who are working on understanding and generating more natural and coherent spoken interactions.
This paper addresses the limited progress in modeling coherence in spoken discourse by incorporating audio-based coherence models. The study demonstrates that integrating audio modality with text improves coherence models for downstream tasks related to spoken discourse.
While there has been significant progress towards modelling coherence in written discourse, the work in modelling spoken discourse coherence has been quite limited. Unlike the coherence in text, coherence in spoken discourse is also dependent on the prosodic and acoustic patterns in speech. In this paper, we model coherence in spoken discourse with audio-based coherence models. We perform experiments with four coherence-related tasks with spoken discourses. In our experiments, we evaluate machine-generated speech against the speech delivered by expert human speakers. We also compare the spoken discourses generated by human language learners of varying language proficiency levels. Our results show that incorporating the audio modality along with the text benefits the coherence models in performing downstream coherence related tasks with spoken discourses.