OLMD: Orientation-aware Long-term Motion Decoupling for Continuous Sign Language Recognition
This work addresses accuracy issues in continuous sign language recognition, an important domain for accessibility, but it appears incremental as it builds on existing methods with specific enhancements.
The paper tackled the challenge of multi-orientational and long-term motions in continuous sign language recognition by proposing the OLMD framework, which improved the word error rate on PHOENIX14 by an absolute 1.6% compared to previous state-of-the-art.
The primary challenge in continuous sign language recognition (CSLR) mainly stems from the presence of multi-orientational and long-term motions. However, current research overlooks these crucial aspects, significantly impacting accuracy. To tackle these issues, we propose a novel CSLR framework: Orientation-aware Long-term Motion Decoupling (OLMD), which efficiently aggregates long-term motions and decouples multi-orientational signals into easily interpretable components. Specifically, our innovative Long-term Motion Aggregation (LMA) module filters out static redundancy while adaptively capturing abundant features of long-term motions. We further enhance orientation awareness by decoupling complex movements into horizontal and vertical components, allowing for motion purification in both orientations. Additionally, two coupling mechanisms are proposed: stage and cross-stage coupling, which together enrich multi-scale features and improve the generalization capabilities of the model. Experimentally, OLMD shows SOTA performance on three large-scale datasets: PHOENIX14, PHOENIX14-T, and CSL-Daily. Notably, we improved the word error rate (WER) on PHOENIX14 by an absolute 1.6% compared to the previous SOTA