CLAINov 1, 2022

Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation

arXiv:2211.00526v19 citationsh-index: 44
Originality Incremental advance
AI Analysis

This work addresses communication barriers for the deaf community by enhancing translation models, though it appears incremental as it builds on existing end-to-end neural SLT approaches.

The paper tackles the problem of neural sign language translation (SLT) by proposing a graph-based cross-modal fusion method to incorporate word-level semantic knowledge, improving performance on the RWTH-PHOENIX-Weather-2014T dataset.

Sign Language (SL), as the mother tongue of the deaf community, is a special visual language that most hearing people cannot understand. In recent years, neural Sign Language Translation (SLT), as a possible way for bridging communication gap between the deaf and the hearing people, has attracted widespread academic attention. We found that the current mainstream end-to-end neural SLT models, which tries to learning language knowledge in a weakly supervised manner, could not mine enough semantic information under the condition of low data resources. Therefore, we propose to introduce additional word-level semantic knowledge of sign language linguistics to assist in improving current end-to-end neural SLT models. Concretely, we propose a novel neural SLT model with multi-modal feature fusion based on the dynamic graph, in which the cross-modal information, i.e. text and video, is first assembled as a dynamic graph according to their correlation, and then the graph is processed by a multi-modal graph encoder to generate the multi-modal embeddings for further usage in the subsequent neural translation models. To the best of our knowledge, we are the first to introduce graph neural networks, for fusing multi-modal information, into neural sign language translation models. Moreover, we conducted experiments on a publicly available popular SLT dataset RWTH-PHOENIX-Weather-2014T. and the quantitative experiments show that our method can improve the model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes