CVCLLGSep 4, 2023

Exploring Attention Mechanisms in Integration of Multi-Modal Information for Sign Language Recognition and Translation

arXiv:2309.01860v42 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses computational efficiency and performance improvements for sign language processing, though it is incremental as it builds on existing multi-modal methods.

The paper tackled the problem of sub-optimal multi-modal feature merging and high computational complexity in sign language recognition and translation by proposing a lightweight cross-attention plugin module and 2-stage training, resulting in a 0.9 reduction in WER for recognition and a 0.8 increase in BLEU-4 scores for translation.

Understanding intricate and fast-paced movements of body parts is essential for the recognition and translation of sign language. The inclusion of additional information intended to identify and locate the moving body parts has been an interesting research topic recently. However, previous works on using multi-modal information raise concerns such as sub-optimal multi-modal feature merging method, or the model itself being too computationally heavy. In our work, we have addressed such issues and used a plugin module based on cross-attention to properly attend to each modality with another. Moreover, we utilized 2-stage training to remove the dependency of separate feature extractors for additional modalities in an end-to-end approach, which reduces the concern about computational complexity. Besides, our additional cross-attention plugin module is very lightweight which doesn't add significant computational overhead on top of the original baseline. We have evaluated the performance of our approaches on the RWTH-PHOENIX-2014 dataset for sign language recognition and the RWTH-PHOENIX-2014T dataset for the sign language translation task. Our approach reduced the WER by 0.9 on the recognition task and increased the BLEU-4 scores by 0.8 on the translation task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes