CVMay 8, 2024

Unsupervised Skin Feature Tracking with Deep Neural Networks

arXiv:2405.04943v1h-index: 12
Originality Incremental advance
AI Analysis

This work addresses the need for accurate skin feature tracking in applications like heart rate estimation and Parkinson's disease monitoring, offering a data-efficient solution that is incremental in its method improvement.

The paper tackled the problem of skin feature tracking without requiring extensive labeled data by using an unsupervised convolutional stacked autoencoder with a Gaussian-weighted loss, achieving a mean tracking error of 0.6 to 3.3 pixels and outperforming both traditional and state-of-the-art supervised methods.

Facial feature tracking is essential in imaging ballistocardiography for accurate heart rate estimation and enables motor degradation quantification in Parkinson's disease through skin feature tracking. While deep convolutional neural networks have shown remarkable accuracy in tracking tasks, they typically require extensive labeled data for supervised training. Our proposed pipeline employs a convolutional stacked autoencoder to match image crops with a reference crop containing the target feature, learning deep feature encodings specific to the object category in an unsupervised manner, thus reducing data requirements. To overcome edge effects making the performance dependent on crop size, we introduced a Gaussian weight on the residual errors of the pixels when calculating the loss function. Training the autoencoder on facial images and validating its performance on manually labeled face and hand videos, our Deep Feature Encodings (DFE) method demonstrated superior tracking accuracy with a mean error ranging from 0.6 to 3.3 pixels, outperforming traditional methods like SIFT, SURF, Lucas Kanade, and the latest transformers like PIPs++ and CoTracker. Overall, our unsupervised learning approach excels in tracking various skin features under significant motion conditions, providing superior feature descriptors for tracking, matching, and image registration compared to both traditional and state-of-the-art supervised learning methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes