CV LG IVMar 23, 2023

Automatic Generation of Labeled Data for Video-Based Human Pose Analysis via NLP applied to YouTube Subtitles

Sebastian Dill, Susi Zhihan, Maurice Rohr, Maziar Sharbafi, Christoph Hoog Antink

arXiv:2304.14489v21.52 citationsh-index: 20

Originality Synthesis-oriented

AI Analysis

This addresses the data bottleneck for at-home exercise monitoring systems, though it is incremental as it applies existing NLP methods to a new domain.

The authors tackled the scarcity of labeled data for video-based exercise evaluation by automatically generating labels using NLP on YouTube subtitles, demonstrating that irrelevant clips (n=332) have significantly different joint visibility compared to relevant clips (n=298).

With recent advancements in computer vision as well as machine learning (ML), video-based at-home exercise evaluation systems have become a popular topic of current research. However, performance depends heavily on the amount of available training data. Since labeled datasets specific to exercising are rare, we propose a method that makes use of the abundance of fitness videos available online. Specifically, we utilize the advantage that videos often not only show the exercises, but also provide language as an additional source of information. With push-ups as an example, we show that through the analysis of subtitle data using natural language processing (NLP), it is possible to create a labeled (irrelevant, relevant correct, relevant incorrect) dataset containing relevant information for pose analysis. In particular, we show that irrelevant clips ($n=332$) have significantly different joint visibility values compared to relevant clips ($n=298$). Inspecting cluster centroids also show different poses for the different classes.

View on arXiv PDF

Similar