ASCLSDApr 23, 2018

Towards an Unsupervised Entrainment Distance in Conversational Speech using Deep Neural Networks

arXiv:1804.08782v115 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of quantifying acoustic adaptation in conversations for applications like mental health assessment, though it is incremental as it builds on existing entrainment concepts with a new metric.

The paper tackled the problem of measuring conversational entrainment by proposing an unsupervised Neural Entrainment Distance (NED) using deep neural networks, and validated it by distinguishing real from fake conversations and linking high NED to emotional bond ratings in suicide assessment interviews.

Entrainment is a known adaptation mechanism that causes interaction participants to adapt or synchronize their acoustic characteristics. Understanding how interlocutors tend to adapt to each other's speaking style through entrainment involves measuring a range of acoustic features and comparing those via multiple signal comparison methods. In this work, we present a turn-level distance measure obtained in an unsupervised manner using a Deep Neural Network (DNN) model, which we call Neural Entrainment Distance (NED). This metric establishes a framework that learns an embedding from the population-wide entrainment in an unlabeled training corpus. We use the framework for a set of acoustic features and validate the measure experimentally by showing its efficacy in distinguishing real conversations from fake ones created by randomly shuffling speaker turns. Moreover, we show real world evidence of the validity of the proposed measure. We find that high value of NED is associated with high ratings of emotional bond in suicide assessment interviews, which is consistent with prior studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes