CVCLJul 22, 2024

FSboard: Over 3 million characters of ASL fingerspelling collected via smartphones

arXiv:2407.15806v17 citationsh-index: 70
Originality Synthesis-oriented
AI Analysis

This dataset could provide immediate benefit to Deaf/Hard of Hearing signers by enabling fingerspelling recognition, though it is an incremental step as fingerspelling is only a small part of sign language translation.

The authors tackled the problem of limited data for sign language understanding by collecting FSboard, a large American Sign Language fingerspelling dataset via smartphones, achieving a baseline character error rate of 11.1%.

Progress in machine understanding of sign languages has been slow and hampered by limited data. In this paper, we present FSboard, an American Sign Language fingerspelling dataset situated in a mobile text entry use case, collected from 147 paid and consenting Deaf signers using Pixel 4A selfie cameras in a variety of environments. Fingerspelling recognition is an incomplete solution that is only one small part of sign language translation, but it could provide some immediate benefit to Deaf/Hard of Hearing signers as more broadly capable technology develops. At >3 million characters in length and >250 hours in duration, FSboard is the largest fingerspelling recognition dataset to date by a factor of >10x. As a simple baseline, we finetune 30 Hz MediaPipe Holistic landmark inputs into ByT5-Small and achieve 11.1% Character Error Rate (CER) on a test set with unique phrases and signers. This quality degrades gracefully when decreasing frame rate and excluding face/body landmarks: plausible optimizations to help models run on device in real time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes