The Benefit of Distraction: Denoising Remote Vitals Measurements using Inverse Attention
This improves camera-based vital sign monitoring for healthcare applications, but it is incremental as it builds on existing attention-based methods.
The paper tackled the problem of denoising remote physiological measurements from videos by using inverse attention to estimate noise from non-signal regions, resulting in a 5.8 dB increase in signal-to-noise ratio and up to 30% reduction in heart and breathing rate estimation errors.
Attention is a powerful concept in computer vision. End-to-end networks that learn to focus selectively on regions of an image or video often perform strongly. However, other image regions, while not necessarily containing the signal of interest, may contain useful context. We present an approach that exploits the idea that statistics of noise may be shared between the regions that contain the signal of interest and those that do not. Our technique uses the inverse of an attention mask to generate a noise estimate that is then used to denoise temporal observations. We apply this to the task of camera-based physiological measurement. A convolutional attention network is used to learn which regions of a video contain the physiological signal and generate a preliminary estimate. A noise estimate is obtained by using the pixel intensities in the inverse regions of the learned attention mask, this in turn is used to refine the estimate of the physiological signal. We perform experiments on two large benchmark datasets and show that this approach produces state-of-the-art results, increasing the signal-to-noise ratio by up to 5.8 dB, reducing heart rate and breathing rate estimation error by as much as 30%, recovering subtle pulse waveform dynamics, and generalizing from RGB to NIR videos without retraining.