SignBart -- New approach with the skeleton sequence for Isolated Sign language Recognition
This work addresses communication barriers for individuals with hearing impairments by providing a more efficient and accurate method for sign language recognition, though it appears incremental as it builds on existing transformer architectures.
The paper tackles the challenge of isolated sign language recognition by proposing a new approach that independently encodes x and y coordinates of skeleton sequences using a BART-based encoder-decoder, achieving 96.04% accuracy on the LSA-64 dataset with only 749,888 parameters.
Sign language recognition is crucial for individuals with hearing impairments to break communication barriers. However, previous approaches have had to choose between efficiency and accuracy. Such as RNNs, LSTMs, and GCNs, had problems with vanishing gradients and high computational costs. Despite improving performance, transformer-based methods were not commonly used. This study presents a new novel SLR approach that overcomes the challenge of independently extracting meaningful information from the x and y coordinates of skeleton sequences, which traditional models often treat as inseparable. By utilizing an encoder-decoder of BART architecture, the model independently encodes the x and y coordinates, while Cross-Attention ensures their interrelation is maintained. With only 749,888 parameters, the model achieves 96.04% accuracy on the LSA-64 dataset, significantly outperforming previous models with over one million parameters. The model also demonstrates excellent performance and generalization across WLASL and ASL-Citizen datasets. Ablation studies underscore the importance of coordinate projection, normalization, and using multiple skeleton components for boosting model efficacy. This study offers a reliable and effective approach for sign language recognition, with strong potential for enhancing accessibility tools for the deaf and hard of hearing.