Investigating Transcription Normalization in the Faetar ASR Benchmark
This addresses transcription normalization for low-resource ASR, but the findings are incremental as they confirm existing difficulties without major breakthroughs.
The study investigated transcription inconsistencies in the Faetar ASR benchmark, finding they are not the main challenge, and showed that lexicon-constrained decoding can be beneficial while bigram language modeling is not.
We examine the role of transcription inconsistencies in the Faetar Automatic Speech Recognition benchmark, a challenging low-resource ASR benchmark. With the help of a small, hand-constructed lexicon, we conclude that find that, while inconsistencies do exist in the transcriptions, they are not the main challenge in the task. We also demonstrate that bigram word-based language modelling is of no added benefit, but that constraining decoding to a finite lexicon can be beneficial. The task remains extremely difficult.