CL CVNov 26, 2025

Bangla Sign Language Translation: Dataset Creation Challenges, Benchmarking and Prospects

Husne Ara Rubaiyeat, Hasan Mahmud, Md Kamrul Hasan

arXiv:2511.21533v12.71 citationsh-index: 16

Originality Synthesis-oriented

AI Analysis

This work addresses a critical need for assistive technology in a low-resource language community, though it is incremental as it focuses on dataset creation and benchmarking rather than novel translation methods.

The authors tackled the lack of resources for Bangla Sign Language Translation by creating the IsharaKhobor dataset and its subsets, which are publicly available to enable AI-based assistive tools for the deaf and hard-of-hearing Bangla-speaking community.

Bangla Sign Language Translation (BdSLT) has been severely constrained so far as the language itself is very low resource. Standard sentence level dataset creation for BdSLT is of immense importance for developing AI based assistive tools for deaf and hard of hearing people of Bangla speaking community. In this paper, we present a dataset, IsharaKhobor , and two subset of it for enabling research. We also present the challenges towards developing the dataset and present some way forward by benchmarking with landmark based raw and RQE embedding. We do some ablation on vocabulary restriction and canonicalization of the same within the dataset, which resulted in two more datasets, IsharaKhobor_small and IsharaKhobor_canonical_small. The dataset is publicly available at: www.kaggle.com/datasets/hasanssl/isharakhobor [1].

View on arXiv PDF

Similar