CVSep 21, 2023

SlowFast Network for Continuous Sign Language Recognition

arXiv:2309.12304v136 citationsh-index: 36Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurately recognizing continuous sign language, which is crucial for improving accessibility for deaf and hard-of-hearing individuals, though it appears incremental as it builds on existing SlowFast networks with new fusion techniques.

The paper tackles the problem of extracting spatial and dynamic features for Continuous Sign Language Recognition (CSLR) by using a SlowFast network with two distinct temporal resolutions and introducing two feature fusion methods, resulting in state-of-the-art performance on datasets like PHOENIX14, PHOENIX14-T, and CSL-Daily.

The objective of this work is the effective extraction of spatial and dynamic features for Continuous Sign Language Recognition (CSLR). To accomplish this, we utilise a two-pathway SlowFast network, where each pathway operates at distinct temporal resolutions to separately capture spatial (hand shapes, facial expressions) and dynamic (movements) information. In addition, we introduce two distinct feature fusion methods, carefully designed for the characteristics of CSLR: (1) Bi-directional Feature Fusion (BFF), which facilitates the transfer of dynamic semantics into spatial semantics and vice versa; and (2) Pathway Feature Enhancement (PFE), which enriches dynamic and spatial representations through auxiliary subnetworks, while avoiding the need for extra inference time. As a result, our model further strengthens spatial and dynamic representations in parallel. We demonstrate that the proposed framework outperforms the current state-of-the-art performance on popular CSLR datasets, including PHOENIX14, PHOENIX14-T, and CSL-Daily.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes