CL SD ASNov 18, 2022

Dialogs Re-enacted Across Languages

Nigel G. Ward, Jonathan E. Avila, Emilia Rivas, Divette Marco

arXiv:2211.11584v21.12 citationsh-index: 23Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for bilingual dialog data for researchers in speech-to-speech translation, though it is incremental as it focuses on data collection rather than novel methods.

The authors tackled the problem of collecting closely matched bilingual dialog data to support cross-language prosodic mapping and speech-to-speech translation improvements, resulting in a publicly released corpus and protocol for data collection.

To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-speech translation, we present a protocol for collecting closely matched pairs of utterances across languages, a description of the resulting data collection and its public release, and some observations and musings. This report is intended for: people using this corpus, people extending this corpus, and people designing similar collections of bilingual dialog data.

View on arXiv PDF Code

Similar