CLNov 17, 2025

A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition

arXiv:2511.13126v1h-index: 1PCI

Originality Synthesis-oriented

AI Analysis

It addresses architecture selection for sign language recognition systems, offering guidance based on trade-offs between accuracy and efficiency, but is incremental as it applies existing methods to this domain.

This study compared recurrent (ConvLSTM) and attention-based (Vanilla Transformer) neural architectures for isolated sign language recognition, finding that the Transformer outperformed ConvLSTM in accuracy, achieving up to 76.8% Top-1 on AzSLD and 88.3% on WLASL, while ConvLSTM was more computationally efficient.

This study presents a systematic comparative analysis of recurrent and attention-based neural architectures for isolated sign language recognition. We implement and evaluate two representative models-ConvLSTM and Vanilla Transformer-on the Azerbaijani Sign Language Dataset (AzSLD) and the Word-Level American Sign Language (WLASL) dataset. Our results demonstrate that the attention-based Vanilla Transformer consistently outperforms the recurrent ConvLSTM in both Top-1 and Top-5 accuracy across datasets, achieving up to 76.8% Top-1 accuracy on AzSLD and 88.3% on WLASL. The ConvLSTM, while more computationally efficient, lags in recognition accuracy, particularly on smaller datasets. These findings highlight the complementary strengths of each paradigm: the Transformer excels in overall accuracy and signer independence, whereas the ConvLSTM offers advantages in computational efficiency and temporal modeling. The study provides a nuanced analysis of these trade-offs, offering guidance for architecture selection in sign language recognition systems depending on application requirements and resource constraints.

View on arXiv PDF

Similar