CLASDec 8, 2025

TeluguST-46: A Benchmark Corpus and Comprehensive Evaluation for Telugu-English Speech Translation

arXiv:2512.07265v11 citationsh-index: 20IJCNLP-AACL
Originality Synthesis-oriented
AI Analysis

This provides a benchmark and evaluation guidance for speech translation in Telugu, a low-resource language with over 80 million speakers, though it is incremental in applying existing methods to new data.

The authors tackled the lack of speech translation resources for Telugu by creating a 46-hour benchmark corpus and evaluating cascaded versus end-to-end architectures, finding that IndicWhisper + IndicMT performed best but finetuned SeamlessM4T was competitive with less data, and traditional metrics like BLEU were more reliable than BERTScore for this language pair.

Despite Telugu being spoken by over 80 million people, speech translation research for this morphologically rich language remains severely underexplored. We address this gap by developing a high-quality Telugu--English speech translation benchmark from 46 hours of manually verified CSTD corpus data (30h/8h/8h train/dev/test split). Our systematic comparison of cascaded versus end-to-end architectures shows that while IndicWhisper + IndicMT achieves the highest performance due to extensive Telugu-specific training data, finetuned SeamlessM4T models demonstrate remarkable competitiveness despite using significantly less Telugu-specific training data. This finding suggests that with careful hyperparameter tuning and sufficient parallel data (potentially less than 100 hours), end-to-end systems can achieve performance comparable to cascaded approaches in low-resource settings. Our metric reliability study evaluating BLEU, METEOR, ChrF++, ROUGE-L, TER, and BERTScore against human judgments reveals that traditional metrics provide better quality discrimination than BERTScore for Telugu--English translation. The work delivers three key contributions: a reproducible Telugu--English benchmark, empirical evidence of competitive end-to-end performance potential in low-resource scenarios, and practical guidance for automatic evaluation in morphologically complex language pairs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes