Spotter+GPT: Turning Sign Spottings into Sentences with LLMs
This work addresses the problem of translating sign language to spoken language for accessibility applications, but it is incremental as it builds on existing LLM and spotting methods.
The paper tackles Sign Language Translation (SLT) by proposing Spotter+GPT, a lightweight framework that uses a sign spotter and LLMs to generate spoken language sentences from videos, eliminating SLT-specific training and reducing computational costs.
Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos. In this paper, we introduce a lightweight, modular SLT framework, Spotter+GPT, that leverages the power of Large Language Models (LLMs) and avoids heavy end-to-end training. Spotter+GPT breaks down the SLT task into two distinct stages. First, a sign spotter identifies individual signs within the input video. The spotted signs are then passed to an LLM, which transforms them into meaningful spoken language sentences. Spotter+GPT eliminates the requirement for SLT-specific training. This significantly reduces computational costs and time requirements. The source code and pretrained weights of the Spotter are available at https://gitlab.surrey.ac.uk/cogvispublic/sign-spotter.