CLAILGJul 12, 2022

Building Korean Sign Language Augmentation (KoSLA) Corpus with Data Augmentation Technique

arXiv:2207.05261v12 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses data scarcity for sign language translation in specific contexts like hospitals, but it is incremental as it applies existing augmentation methods to a new corpus.

The paper tackles the problem of data scarcity in sign language translation by building the KoSLA corpus, a multimodal dataset including manual and non-manual signals, and uses data augmentation techniques like synonym replacement to improve translation performance, resulting in significant BLEU scores.

We present an efficient framework of corpus for sign language translation. Aided with a simple but dramatic data augmentation technique, our method converts text into annotated forms with minimum information loss. Sign languages are composed of manual signals, non-manual signals, and iconic features. According to professional sign language interpreters, non-manual signals such as facial expressions and gestures play an important role in conveying exact meaning. By considering the linguistic features of sign language, our proposed framework is a first and unique attempt to build a multimodal sign language augmentation corpus (hereinafter referred to as the KoSLA corpus) containing both manual and non-manual modalities. The corpus we built demonstrates confident results in the hospital context, showing improved performance with augmented datasets. To overcome data scarcity, we resorted to data augmentation techniques such as synonym replacement to boost the efficiency of our translation model and available data, while maintaining grammatical and semantic structures of sign language. For the experimental support, we verify the effectiveness of data augmentation technique and usefulness of our corpus by performing a translation task between normal sentences and sign language annotations on two tokenizers. The result was convincing, proving that the BLEU scores with the KoSLA corpus were significant.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes