CVAIMay 11, 2025

Efficient and Robust Multidimensional Attention in Remote Physiological Sensing through Target Signal Constrained Factorization

arXiv:2505.07013v1h-index: 5
Originality Highly original
AI Analysis

This work addresses the challenge of reliable non-invasive vital sign monitoring in unconstrained environments for healthcare and human-computer interaction, representing a strong specific gain rather than a foundational advancement.

The paper tackles the problem of robustness to domain shifts in remote physiological sensing from video data by introducing the Target Signal Constrained Factorization module (TSFM) and MMRPhys architecture, achieving significant outperformance over state-of-the-art methods in cross-dataset evaluation for rPPG and rRSP estimation while maintaining minimal inference latency.

Remote physiological sensing using camera-based technologies offers transformative potential for non-invasive vital sign monitoring across healthcare and human-computer interaction domains. Although deep learning approaches have advanced the extraction of physiological signals from video data, existing methods have not been sufficiently assessed for their robustness to domain shifts. These shifts in remote physiological sensing include variations in ambient conditions, camera specifications, head movements, facial poses, and physiological states which often impact real-world performance significantly. Cross-dataset evaluation provides an objective measure to assess generalization capabilities across these domain shifts. We introduce Target Signal Constrained Factorization module (TSFM), a novel multidimensional attention mechanism that explicitly incorporates physiological signal characteristics as factorization constraints, allowing more precise feature extraction. Building on this innovation, we present MMRPhys, an efficient dual-branch 3D-CNN architecture designed for simultaneous multitask estimation of photoplethysmography (rPPG) and respiratory (rRSP) signals from multimodal RGB and thermal video inputs. Through comprehensive cross-dataset evaluation on five benchmark datasets, we demonstrate that MMRPhys with TSFM significantly outperforms state-of-the-art methods in generalization across domain shifts for rPPG and rRSP estimation, while maintaining a minimal inference latency suitable for real-time applications. Our approach establishes new benchmarks for robust multitask and multimodal physiological sensing and offers a computationally efficient framework for practical deployment in unconstrained environments. The web browser-based application featuring on-device real-time inference of MMRPhys model is available at https://physiologicailab.github.io/mmrphys-live

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes