CVCLMay 26, 2023

Motion-Based Sign Language Video Summarization using Curvature and Torsion

arXiv:2305.16801v3
Originality Incremental advance
AI Analysis

This work addresses the problem of generating concise summaries for sign language videos, which is incremental as it builds on prior 2-D methods by incorporating 3-D motion features.

The paper tackles sign language video summarization by extending 2-D wrist trajectory analysis to 3-D hand motion, using curvature and torsion to identify keyframes, and reports promising results in objective measures, human evaluation, and gloss classification.

An interesting problem in many video-based applications is the generation of short synopses by selecting the most informative frames, a procedure which is known as video summarization. For sign language videos the benefits of using the $t$-parameterized counterpart of the curvature of the 2-D signer's wrist trajectory to identify keyframes, have been recently reported in the literature. In this paper we extend these ideas by modeling the 3-D hand motion that is extracted from each frame of the video. To this end we propose a new informative function based on the $t$-parameterized curvature and torsion of the 3-D trajectory. The method to characterize video frames as keyframes depends on whether the motion occurs in 2-D or 3-D space. Specifically, in the case of 3-D motion we look for the maxima of the harmonic mean of the curvature and torsion of the target's trajectory; in the planar motion case we seek for the maxima of the trajectory's curvature. The proposed 3-D feature is experimentally evaluated in applications of sign language videos on (1) objective measures using ground-truth keyframe annotations, (2) human-based evaluation of understanding, and (3) gloss classification and the results obtained are promising.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes