CLCVMay 21, 2022

Unsupervised Sign Language Phoneme Clustering using HamNoSys Notation

arXiv:2205.10560v11 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more diverse and accessible sign language resources for researchers and developers, though it appears incremental as it builds on existing notation and unsupervised techniques.

The paper tackles the problem of limited and costly sign language data by proposing an unsupervised method to automatically generate and annotate sign language corpora from online videos, using phoneme clustering based on HamNoSys notation.

Traditionally, sign language resources have been collected in controlled settings for specific tasks involving supervised sign classification or linguistic studies accompanied by specific annotation type. To date, very few who explored signing videos found online on social media platforms as well as the use of unsupervised methods applied to such resources. Due to the fact that the field is striving to achieve acceptable model performance on the data that differs from that seen during training calls for more diversity in sign language data, stepping away from the data obtained in controlled laboratory settings. Moreover, since the sign language data collection and annotation carries large overheads, it is desirable to accelerate the annotation process. Considering the aforementioned tendencies, this paper takes the side of harvesting online data in a pursuit for automatically generating and annotating sign language corpora through phoneme clustering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes