Instantaneous Physiological Estimation using Video Transformers
This work addresses the need for continuous physiological monitoring in healthcare to enable early detection of critical conditions, representing a domain-specific advancement.
The paper tackled the problem of estimating instantaneous heart rate and respiration rate from face videos, which was previously limited to episodic scores, and achieved an instantaneous-MAE of 13.0 beats-per-minute for heart rate and outperformed other methods for respiration rate on the V4V benchmark.
Video-based physiological signal estimation has been limited primarily to predicting episodic scores in windowed intervals. While these intermittent values are useful, they provide an incomplete picture of patients' physiological status and may lead to late detection of critical conditions. We propose a video Transformer for estimating instantaneous heart rate and respiration rate from face videos. Physiological signals are typically confounded by alignment errors in space and time. To overcome this, we formulated the loss in the frequency domain. We evaluated the method on the large scale Vision-for-Vitals (V4V) benchmark. It outperformed both shallow and deep learning based methods for instantaneous respiration rate estimation. In the case of heart-rate estimation, it achieved an instantaneous-MAE of 13.0 beats-per-minute.