CV AIFeb 4, 2025

Exploiting Ensemble Learning for Cross-View Isolated Sign Language Recognition

Fei Wang, Kun Li, Yiqi Nie, Zhangling Duan, Peng Zou, Zhiliang Wu, Yuwei Wang, Yanyan Wei

arXiv:2502.02196v118.215 citationsh-index: 17Has CodeWWW

Originality Incremental advance

AI Analysis

This addresses a critical issue in sign language recognition for real-world applications where camera angles vary, though it is incremental as it builds on existing methods.

The paper tackled the problem of cross-view isolated sign language recognition, where models must recognize signs from varying camera angles, by using an ensemble learning approach based on a Video Swin Transformer, achieving 3rd place in both RGB-based and RGB-D-based tracks in a challenge.

In this paper, we present our solution to the Cross-View Isolated Sign Language Recognition (CV-ISLR) challenge held at WWW 2025. CV-ISLR addresses a critical issue in traditional Isolated Sign Language Recognition (ISLR), where existing datasets predominantly capture sign language videos from a frontal perspective, while real-world camera angles often vary. To accurately recognize sign language from different viewpoints, models must be capable of understanding gestures from multiple angles, making cross-view recognition challenging. To address this, we explore the advantages of ensemble learning, which enhances model robustness and generalization across diverse views. Our approach, built on a multi-dimensional Video Swin Transformer model, leverages this ensemble strategy to achieve competitive performance. Finally, our solution ranked 3rd in both the RGB-based ISLR and RGB-D-based ISLR tracks, demonstrating the effectiveness in handling the challenges of cross-view recognition. The code is available at: https://github.com/Jiafei127/CV_ISLR_WWW2025.

View on arXiv PDF Code

Similar