CLJul 9, 2023

Towards cross-language prosody transfer for dialog

arXiv:2307.04123v18 citationsh-index: 23
Originality Synthesis-oriented
AI Analysis

This addresses the issue of preserving prosody in dialog translation for users of speech-to-speech systems, but it is incremental as it focuses on data collection and analysis rather than a novel solution.

The paper tackled the problem of inadequate prosody transfer in speech-to-speech translation for dialog, which can lose speaker intent and stance nuances, by collecting an English-Spanish corpus of 1871 matched utterance pairs and developing a prosodic dissimilarity metric to analyze cross-language differences and baseline models.

Speech-to-speech translation systems today do not adequately support use for dialog purposes. In particular, nuances of speaker intent and stance can be lost due to improper prosody transfer. We present an exploration of what needs to be done to overcome this. First, we developed a data collection protocol in which bilingual speakers re-enact utterances from an earlier conversation in their other language, and used this to collect an English-Spanish corpus, so far comprising 1871 matched utterance pairs. Second, we developed a simple prosodic dissimilarity metric based on Euclidean distance over a broad set of prosodic features. We then used these to investigate cross-language prosodic differences, measure the likely utility of three simple baseline models, and identify phenomena which will require more powerful modeling. Our findings should inform future research on cross-language prosody and the design of speech-to-speech translation systems capable of effective prosody transfer.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes