CVMar 23, 2023
Automatic Generation of Labeled Data for Video-Based Human Pose Analysis via NLP applied to YouTube SubtitlesSebastian Dill, Susi Zhihan, Maurice Rohr et al.
With recent advancements in computer vision as well as machine learning (ML), video-based at-home exercise evaluation systems have become a popular topic of current research. However, performance depends heavily on the amount of available training data. Since labeled datasets specific to exercising are rare, we propose a method that makes use of the abundance of fitness videos available online. Specifically, we utilize the advantage that videos often not only show the exercises, but also provide language as an additional source of information. With push-ups as an example, we show that through the analysis of subtitle data using natural language processing (NLP), it is possible to create a labeled (irrelevant, relevant correct, relevant incorrect) dataset containing relevant information for pose analysis. In particular, we show that irrelevant clips ($n=332$) have significantly different joint visibility values compared to relevant clips ($n=298$). Inspecting cluster centroids also show different poses for the different classes.
CVDec 11, 2025
3D Blood Pulsation MapsMaurice Rohr, Tobias Reinhardt, Tizian Dege et al.
We present Pulse3DFace, the first dataset of its kind for estimating 3D blood pulsation maps. These maps can be used to develop models of dynamic facial blood pulsation, enabling the creation of synthetic video data to improve and validate remote pulse estimation methods via photoplethysmography imaging. Additionally, the dataset facilitates research into novel multi-view-based approaches for mitigating illumination effects in blood pulsation analysis. Pulse3DFace consists of raw videos from 15 subjects recorded at 30 Hz with an RGB camera from 23 viewpoints, blood pulse reference measurements, and facial 3D scans generated using monocular structure-from-motion techniques. It also includes processed 3D pulsation maps compatible with the texture space of the 3D head model FLAME. These maps provide signal-to-noise ratio, local pulse amplitude, phase information, and supplementary data. We offer a comprehensive evaluation of the dataset's illumination conditions, map consistency, and its ability to capture physiologically meaningful features in the facial and neck skin regions.
SPFeb 12
Task- and Metric-Specific Signal Quality Indices for Medical Time SeriesJad Haidamous, Christoph Hoog Antink
Medical time series such as electrocardiograms (ECGs) and photoplethysmograms (PPGs) are frequently affected by measurement artifacts due to challenging acquisition environments, such as in ambulances and during routine daily activities. Since automated algorithms for analyzing such signals increasingly inform clinically relevant decisions, identifying signal segments on which these algorithms may produce unreliable outputs is of critical importance. Signal quality indices (SQIs) are commonly used for this purpose. However, most existing SQIs are task agnostic and do not account for the specific algorithm and performance metric used downstream. In this work, we formalize signal quality as a task- and metric-dependent concept and propose a perturbation-based SQI (pSQI) that aims to detect an algorithm's performance degradation on an input signal with respect to a metric. The pSQI is defined as the worst-case value of the performance metric under an additive, colored Gaussian noise perturbation with a lower-bounded signal-to-noise ratio. We introduce formal requirements for task- and metric-specific SQIs, including monotonicity of the metric in expectation and maximal separation under thresholding. Experiments on R-peak detection and atrial fibrillation classification benchmarks demonstrate that the proposed pSQI consistently outperforms existing feature- and deep learning-based SQIs in identifying unreliable inputs without requiring training.