CVApr 22, 2022

Keypoint based Sign Language Translation without Glosses

Youngmin Kim, Minji Kwak, Dain Lee, Yeongeun Kim, Hyeongboo Baek

arXiv:2204.10511v25.710 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses the challenge of making sign language accessible to non-disabled people by enabling direct translation, though it appears incremental as it builds on existing translation models with specific enhancements.

The paper tackled the problem of translating sign language videos directly into spoken language without using glosses, proposing a keypoint normalization method and stochastic frame selection, which improved performance as demonstrated in quantitative experiments.

Sign Language Translation (SLT) is a task that has not been studied relatively much compared to the study of Sign Language Recognition (SLR). However, the SLR is a study that recognizes the unique grammar of sign language, which is different from the spoken language and has a problem that non-disabled people cannot easily interpret. So, we're going to solve the problem of translating directly spoken language in sign language video. To this end, we propose a new keypoint normalization method for performing translation based on the skeleton point of the signer and robustly normalizing these points in sign language translation. It contributed to performance improvement by a customized normalization method depending on the body parts. In addition, we propose a stochastic frame selection method that enables frame augmentation and sampling at the same time. Finally, it is translated into the spoken language through an Attention-based translation model. Our method can be applied to various datasets in a way that can be applied to datasets without glosses. In addition, quantitative experimental evaluation proved the excellence of our method.

View on arXiv PDF

Similar