CLAIAug 8, 2023

Gloss Alignment Using Word Embeddings

arXiv:2308.04248v12 citationsh-index: 56
Originality Incremental advance
AI Analysis

This addresses the challenge of improving annotation for sign language translation datasets, enabling better use of unlabeled broadcast data, though it is incremental as it builds on existing alignment techniques.

The paper tackles the problem of misalignment between automatically spotted signs in sign language videos and their corresponding subtitles by proposing a method using large spoken language models, achieving up to a 33.22 BLEU-1 score in word alignment on datasets like MDGS and BOBSL.

Capturing and annotating Sign language datasets is a time consuming and costly process. Current datasets are orders of magnitude too small to successfully train unconstrained \acf{slt} models. As a result, research has turned to TV broadcast content as a source of large-scale training data, consisting of both the sign language interpreter and the associated audio subtitle. However, lack of sign language annotation limits the usability of this data and has led to the development of automatic annotation techniques such as sign spotting. These spottings are aligned to the video rather than the subtitle, which often results in a misalignment between the subtitle and spotted signs. In this paper we propose a method for aligning spottings with their corresponding subtitles using large spoken language models. Using a single modality means our method is computationally inexpensive and can be utilized in conjunction with existing alignment techniques. We quantitatively demonstrate the effectiveness of our method on the \acf{mdgs} and \acf{bobsl} datasets, recovering up to a 33.22 BLEU-1 score in word alignment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes