A case study on using speech-to-translation alignments for language documentation
This work addresses the challenge of language documentation for low-resource or endangered languages, but it is incremental as it builds on existing methods for alignment and transcription.
The study tackled the problem of improving crowdsourced transcriptions for low-resource languages by using speech-to-translation alignments, showing that this approach can be beneficial in a small-scale case study, with a simple phonetically aware string averaging technique producing higher quality transcriptions.
For many low-resource or endangered languages, spoken language resources are more likely to be annotated with translations than with transcriptions. Recent work exploits such annotations to produce speech-to-translation alignments, without access to any text transcriptions. We investigate whether providing such information can aid in producing better (mismatched) crowdsourced transcriptions, which in turn could be valuable for training speech recognition systems, and show that they can indeed be beneficial through a small-scale case study as a proof-of-concept. We also present a simple phonetically aware string averaging technique that produces transcriptions of higher quality.