CLIVMay 11, 2025

TSLFormer: A Lightweight Transformer Model for Turkish Sign Language Recognition Using Skeletal Landmarks

arXiv:2505.07890v44 citationsh-index: 25
Originality Incremental advance
AI Analysis

This work addresses sign language recognition for hearing-impaired individuals, offering a real-time, mobile solution, though it is incremental as it applies existing transformer methods to a new domain.

The study tackled Turkish Sign Language recognition by proposing TSLFormer, a lightweight transformer model that uses 3D joint positions from skeletal landmarks, achieving competitive performance on the AUTSL dataset with over 36,000 samples and 227 words.

This study presents TSLFormer, a light and robust word-level Turkish Sign Language (TSL) recognition model that treats sign gestures as ordered, string-like language. Instead of using raw RGB or depth videos, our method only works with 3D joint positions - articulation points - extracted using Google's Mediapipe library, which focuses on the hand and torso skeletal locations. This creates efficient input dimensionality reduction while preserving important semantic gesture information. Our approach revisits sign language recognition as sequence-to-sequence translation, inspired by the linguistic nature of sign languages and the success of transformers in natural language processing. Since TSLFormer uses the self-attention mechanism, it effectively captures temporal co-occurrence within gesture sequences and highlights meaningful motion patterns as words unfold. Evaluated on the AUTSL dataset with over 36,000 samples and 227 different words, TSLFormer achieves competitive performance with minimal computational cost. These results show that joint-based input is sufficient for enabling real-time, mobile, and assistive communication systems for hearing-impaired individuals.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes