CLMar 8, 2024

FFSTC: Fongbe to French Speech Translation Corpus

arXiv:2403.05488v182 citationsh-index: 10LREC
Originality Synthesis-oriented
AI Analysis

This provides a new dataset for speech translation in the low-resource Fongbe language, addressing a gap for researchers in computational linguistics.

The authors introduced the first Fongbe to French Speech Translation Corpus (FFSTC), containing approximately 31 hours of Fongbe audio with French transcriptions, and established baseline translation scores of 8.96 for transformer_s and 8.14 for conformer models.

In this paper, we introduce the Fongbe to French Speech Translation Corpus (FFSTC) for the first time. This corpus encompasses approximately 31 hours of collected Fongbe language content, featuring both French transcriptions and corresponding Fongbe voice recordings. FFSTC represents a comprehensive dataset compiled through various collection methods and the efforts of dedicated individuals. Furthermore, we conduct baseline experiments using Fairseq's transformer_s and conformer models to evaluate data quality and validity. Our results indicate a score of 8.96 for the transformer_s model and 8.14 for the conformer model, establishing a baseline for the FFSTC corpus.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes