CL AIJul 7, 2025

iLSU-T: an Open Dataset for Uruguayan Sign Language Translation

Ariel E. Stassi, Yanina Boria, J. Matías Di Martino, Gregory Randall

arXiv:2507.21104v1h-index: 2FG

Originality Synthesis-oriented

AI Analysis

This provides a domain-specific resource for improving accessibility and inclusion for Uruguayan Sign Language users, though it is incremental as it adapts existing methods to new data.

The authors tackled the lack of localized data for Uruguayan Sign Language translation by creating iLSU-T, an open dataset with over 185 hours of interpreted videos, and established baselines using three state-of-the-art algorithms to evaluate its usefulness.

Automatic sign language translation has gained particular interest in the computer vision and computational linguistics communities in recent years. Given each sign language country particularities, machine translation requires local data to develop new techniques and adapt existing ones. This work presents iLSU T, an open dataset of interpreted Uruguayan Sign Language RGB videos with audio and text transcriptions. This type of multimodal and curated data is paramount for developing novel approaches to understand or generate tools for sign language processing. iLSU T comprises more than 185 hours of interpreted sign language videos from public TV broadcasting. It covers diverse topics and includes the participation of 18 professional interpreters of sign language. A series of experiments using three state of the art translation algorithms is presented. The aim is to establish a baseline for this dataset and evaluate its usefulness and the proposed pipeline for data processing. The experiments highlight the need for more localized datasets for sign language translation and understanding, which are critical for developing novel tools to improve accessibility and inclusion of all individuals. Our data and code can be accessed.

View on arXiv PDF

Similar