SignIT: A Comprehensive Dataset and Multimodal Analysis for Italian Sign Language Recognition
This work addresses the problem of sign language recognition for Italian users by providing a dataset and benchmark, but it is incremental as it applies existing methods to new data.
The authors introduced SignIT, a new dataset for Italian Sign Language recognition containing 644 videos (3.33 hours) annotated with 94 sign classes, and benchmarked state-of-the-art models to show their limitations on this challenging task.
In this work we present SignIT, a new dataset to study the task of Italian Sign Language (LIS) recognition. The dataset is composed of 644 videos covering 3.33 hours. We manually annotated videos considering a taxonomy of 94 distinct sign classes belonging to 5 macro-categories: Animals, Food, Colors, Emotions and Family. We also extracted 2D keypoints related to the hands, face and body of the users. With the dataset, we propose a benchmark for the sign recognition task, adopting several state-of-the-art models showing how temporal information, 2D keypoints and RGB frames can be influence the performance of these models. Results show the limitations of these models on this challenging LIS dataset. We release data and annotations at the following link: https://fpv-iplab.github.io/SignIT/.