Boyi Ma

9.6HCJul 13

HEART-Watch: A multimodal physiological dataset from a Google Pixel Watch across different physical states

Jathushan Kaetheeswaran, Boyi Ma, Ali Abedi et al.

Consumer-grade smartwatches offer a new option for personalized health monitoring for general consumers, as cardiovascular diseases continue to prevail as the leading cause of global mortality. The development and validation of reliable cardiovascular monitoring algorithms for these consumer-grade devices requires realistic biosignal data from diverse sets of participants. However, the availability of public consumer-grade smartwatch datasets with synchronized cardiovascular biosignals remains limited, and existing datasets often lack rich demographic diversity in their participant cohorts, potentially leading to biased algorithm development. This paper presents HEART-Watch, a multimodal physiological dataset of synchronized wrist-worn Google Pixel Watch electrocardiogram (ECG), photoplethysmography, and accelerometer signals from a diverse cohort of 40 healthy adults across three physical states - sitting, standing and walking - alongside reference chest ECG. Intermittent upper arm blood pressure measurements and concurrent biosignals were collected as an additional biomarker for future research. The motivation, methodology, and initial analyses of results are presented. HEART-Watch is intended to support the development and benchmarking of robust cardiovascular algorithms on consumer-grade smartwatches across diverse populations.

8.4CVMay 3, 2025

Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement

Long Bai, Boyi Ma, Ruohan Wang et al.

Surgical workflow recognition is vital for automating tasks, supporting decision-making, and training novice surgeons, ultimately improving patient safety and standardizing procedures. However, data corruption can lead to performance degradation due to issues like occlusion from bleeding or smoke in surgical scenes and problems with data storage and transmission. In this case, we explore a robust graph-based multimodal approach to integrating vision and kinematic data to enhance accuracy and reliability. Vision data captures dynamic surgical scenes, while kinematic data provides precise movement information, overcoming limitations of visual recognition under adverse conditions. We propose a multimodal Graph Representation network with Adversarial feature Disentanglement (GRAD) for robust surgical workflow recognition in challenging scenarios with domain shifts or corrupted data. Specifically, we introduce a Multimodal Disentanglement Graph Network that captures fine-grained visual information while explicitly modeling the complex relationships between vision and kinematic embeddings through graph-based message modeling. To align feature spaces across modalities, we propose a Vision-Kinematic Adversarial framework that leverages adversarial training to reduce modality gaps and improve feature consistency. Furthermore, we design a Contextual Calibrated Decoder, incorporating temporal and contextual priors to enhance robustness against domain shifts and corrupted data. Extensive comparative and ablation experiments demonstrate the effectiveness of our model and proposed modules. Moreover, our robustness experiments show that our method effectively handles data corruption during storage and transmission, exhibiting excellent stability and robustness. Our approach aims to advance automated surgical workflow recognition, addressing the complexities and dynamism inherent in surgical procedures.

Boyi Ma

2 Papers