RoCoISLR: A Romanian Corpus for Isolated Sign Language Recognition
This addresses the communication gap for deaf communities in Romania by providing a foundational dataset, though it is incremental as it applies existing methods to a new language.
The authors tackled the lack of a large-scale dataset for Romanian Isolated Sign Language Recognition by introducing RoCoISLR, a corpus with over 9,000 video samples across nearly 6,000 glosses, and benchmarked seven models, with Swin Transformer achieving a Top-1 accuracy of 34.1%.
Automatic sign language recognition plays a crucial role in bridging the communication gap between deaf communities and hearing individuals; however, most available datasets focus on American Sign Language. For Romanian Isolated Sign Language Recognition (RoISLR), no large-scale, standardized dataset exists, which limits research progress. In this work, we introduce a new corpus for RoISLR, named RoCoISLR, comprising over 9,000 video samples that span nearly 6,000 standardized glosses from multiple sources. We establish benchmark results by evaluating seven state-of-the-art video recognition models-I3D, SlowFast, Swin Transformer, TimeSformer, Uniformer, VideoMAE, and PoseConv3D-under consistent experimental setups, and compare their performance with that of the widely used WLASL2000 corpus. According to the results, transformer-based architectures outperform convolutional baselines; Swin Transformer achieved a Top-1 accuracy of 34.1%. Our benchmarks highlight the challenges associated with long-tail class distributions in low-resource sign languages, and RoCoISLR provides the initial foundation for systematic RoISLR research.