CVFeb 19, 2024

PhySU-Net: Long Temporal Context Transformer for rPPG with Self-Supervised Pre-training

arXiv:2402.11913v16.510 citationsh-index: 4ICPR

Originality Incremental advance

AI Analysis

This work addresses the challenge of limited data availability and temporal modeling in rPPG for healthcare monitoring, though it appears incremental by combining existing ideas like transformers and self-supervision in a new domain.

The paper tackles the problem of remote photoplethysmography (rPPG) for contactless cardiac measurement from facial videos by proposing PhySU-Net, a transformer network with long temporal context and a self-supervised pre-training strategy, showing superior performance on public datasets like OBF and VIPL-HR.

Remote photoplethysmography (rPPG) is a promising technology that consists of contactless measuring of cardiac activity from facial videos. Most recent approaches utilize convolutional networks with limited temporal modeling capability or ignore long temporal context. Supervised rPPG methods are also severely limited by scarce data availability. In this work, we propose PhySU-Net, the first long spatial-temporal map rPPG transformer network and a self-supervised pre-training strategy that exploits unlabeled data to improve our model. Our strategy leverages traditional methods and image masking to provide pseudo-labels for self-supervised pre-training. Our model is tested on two public datasets (OBF and VIPL-HR) and shows superior performance in supervised training. Furthermore, we demonstrate that our self-supervised pre-training strategy further improves our model's performance by leveraging representations learned from unlabeled data.

View on arXiv PDF

Similar