CLApr 11, 2022

A Token-level Contrastive Framework for Sign Language Translation

Biao Fu, Peigen Ye, Liang Zhang, Pei Yu, Cong Hu, Yidong Chen, Xiaodong Shi

arXiv:2204.04916v33.018 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This addresses the communication gap for deaf and hearing people by enhancing SLT accuracy, though it is incremental as it builds on existing NMT methods.

The paper tackles the problem of limited training data in Sign Language Translation (SLT) by proposing ConSLT, a token-level contrastive learning framework, which improves translation quality on benchmarks like PHOENIX14T and CSL-Daily compared to strong baselines.

Sign Language Translation (SLT) is a promising technology to bridge the communication gap between the deaf and the hearing people. Recently, researchers have adopted Neural Machine Translation (NMT) methods, which usually require large-scale corpus for training, to achieve SLT. However, the publicly available SLT corpus is very limited, which causes the collapse of the token representations and the inaccuracy of the generated tokens. To alleviate this issue, we propose ConSLT, a novel token-level \textbf{Con}trastive learning framework for \textbf{S}ign \textbf{L}anguage \textbf{T}ranslation , which learns effective token representations by incorporating token-level contrastive learning into the SLT decoding process. Concretely, ConSLT treats each token and its counterpart generated by different dropout masks as positive pairs during decoding, and then randomly samples $K$ tokens in the vocabulary that are not in the current sentence to construct negative examples. We conduct comprehensive experiments on two benchmarks (PHOENIX14T and CSL-Daily) for both end-to-end and cascaded settings. The experimental results demonstrate that ConSLT can achieve better translation quality than the strong baselines.

View on arXiv PDF Code

Similar