Dialogs Re-enacted Across Languages
This work addresses the need for bilingual dialog data for researchers in speech-to-speech translation, though it is incremental as it focuses on data collection rather than novel methods.
The authors tackled the problem of collecting closely matched bilingual dialog data to support cross-language prosodic mapping and speech-to-speech translation improvements, resulting in a publicly released corpus and protocol for data collection.
To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-speech translation, we present a protocol for collecting closely matched pairs of utterances across languages, a description of the resulting data collection and its public release, and some observations and musings. This report is intended for: people using this corpus, people extending this corpus, and people designing similar collections of bilingual dialog data.