CVSep 3, 2024

Less is more: concatenating videos for Sign Language Translation from a small set of signs

David Vinicius da Silva, Valter Estevam, David Menotti

arXiv:2409.01506v12.0Has Code

Originality Incremental advance

AI Analysis

This addresses data scarcity for sign language translation researchers, offering a low-cost method to create or extend datasets, though it is incremental as it builds on existing techniques.

The paper tackles the limited labeled data problem for Brazilian Sign Language (Libras) to Portuguese translation by generating training content through concatenating short clips of isolated signs, achieving BLEU-4 and METEOR scores of 9.2% and 26.2% respectively.

The limited amount of labeled data for training the Brazilian Sign Language (Libras) to Portuguese Translation models is a challenging problem due to video collection and annotation costs. This paper proposes generating sign language content by concatenating short clips containing isolated signals for training Sign Language Translation models. We employ the V-LIBRASIL dataset, composed of 4,089 sign videos for 1,364 signs, interpreted by at least three persons, to create hundreds of thousands of sentences with their respective Libras translation, and then, to feed the model. More specifically, we propose several experiments varying the vocabulary size and sentence structure, generating datasets with approximately 170K, 300K, and 500K videos. Our results achieve meaningful scores of 9.2% and 26.2% for BLEU-4 and METEOR, respectively. Our technique enables the creation or extension of existing datasets at a much lower cost than the collection and annotation of thousands of sentences providing clear directions for future works.

View on arXiv PDF Code

Similar