CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center
This work addresses the need for domain-specific empathetic dialogue data for speech synthesis in customer centers, but it is incremental as it extends an existing corpus to a new domain.
The authors tackled the lack of a Japanese speech corpus for empathetic dialogue in customer service by creating CALLS, a new corpus based on simulated phone calls, and found that mixing it with an existing teacher-student corpus during training led to biased improvements in speech synthesis quality due to expressiveness differences.
We present CALLS, a Japanese speech corpus that considers phone calls in a customer center as a new domain of empathetic spoken dialogue. The existing STUDIES corpus covers only empathetic dialogue between a teacher and student in a school. To extend the application range of empathetic dialogue speech synthesis (EDSS), we designed our corpus to include the same female speaker as the STUDIES teacher, acting as an operator in simulated phone calls. We describe a corpus construction methodology and analyze the recorded speech. We also conduct EDSS experiments using the CALLS and STUDIES corpora to investigate the effect of domain differences. The results show that mixing the two corpora during training causes biased improvements in the quality of synthetic speech due to the different degrees of expressiveness. Our project page of the corpus is http://sython.org/Corpus/STUDIES-2.