CalibrationPhys: Self-supervised Video-based Heart and Respiratory Rate Measurements by Calibrating Between Multiple Cameras
This addresses the need for user-friendly, non-contact health monitoring by reducing reliance on costly labeled data, though it is incremental as it builds on existing video-based techniques.
The paper tackles the problem of expensive ground-truth data collection for video-based heart and respiratory rate measurements by proposing CalibrationPhys, a self-supervised method that calibrates between multiple cameras, and it outperforms state-of-the-art methods in experiments on two datasets.
Video-based heart and respiratory rate measurements using facial videos are more useful and user-friendly than traditional contact-based sensors. However, most of the current deep learning approaches require ground-truth pulse and respiratory waves for model training, which are expensive to collect. In this paper, we propose CalibrationPhys, a self-supervised video-based heart and respiratory rate measurement method that calibrates between multiple cameras. CalibrationPhys trains deep learning models without supervised labels by using facial videos captured simultaneously by multiple cameras. Contrastive learning is performed so that the pulse and respiratory waves predicted from the synchronized videos using multiple cameras are positive and those from different videos are negative. CalibrationPhys also improves the robustness of the models by means of a data augmentation technique and successfully leverages a pre-trained model for a particular camera. Experimental results utilizing two datasets demonstrate that CalibrationPhys outperforms state-of-the-art heart and respiratory rate measurement methods. Since we optimize camera-specific models using only videos from multiple cameras, our approach makes it easy to use arbitrary cameras for heart and respiratory rate measurements.