CLNov 17, 2025

A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition

arXiv:2511.13126v1h-index: 1PCI
Originality Synthesis-oriented
AI Analysis

It addresses architecture selection for sign language recognition systems, offering guidance based on trade-offs between accuracy and efficiency, but is incremental as it applies existing methods to this domain.

This study compared recurrent (ConvLSTM) and attention-based (Vanilla Transformer) neural architectures for isolated sign language recognition, finding that the Transformer outperformed ConvLSTM in accuracy, achieving up to 76.8% Top-1 on AzSLD and 88.3% on WLASL, while ConvLSTM was more computationally efficient.

This study presents a systematic comparative analysis of recurrent and attention-based neural architectures for isolated sign language recognition. We implement and evaluate two representative models-ConvLSTM and Vanilla Transformer-on the Azerbaijani Sign Language Dataset (AzSLD) and the Word-Level American Sign Language (WLASL) dataset. Our results demonstrate that the attention-based Vanilla Transformer consistently outperforms the recurrent ConvLSTM in both Top-1 and Top-5 accuracy across datasets, achieving up to 76.8% Top-1 accuracy on AzSLD and 88.3% on WLASL. The ConvLSTM, while more computationally efficient, lags in recognition accuracy, particularly on smaller datasets. These findings highlight the complementary strengths of each paradigm: the Transformer excels in overall accuracy and signer independence, whereas the ConvLSTM offers advantages in computational efficiency and temporal modeling. The study provides a nuanced analysis of these trade-offs, offering guidance for architecture selection in sign language recognition systems depending on application requirements and resource constraints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes