SocialPulse: On-Device Detection of Social Interactions in Naturalistic Settings Using Smartwatch Multimodal Sensing
For researchers and developers of wearable sensing systems, this work demonstrates the feasibility of detecting diverse social interactions in real-world settings, addressing the gap between controlled lab studies and naturalistic deployment.
The paper presents SocialPulse, an on-device system for detecting social interactions in naturalistic settings using smartwatch multimodal sensing. In a real-world deployment with 38 participants and over 900 hours of data, the system detected 1,691 interactions with 77.28% confirmed via self-report, and a foreground speech detector achieved 85.51% balanced accuracy, outperforming prior work by 5.11%.
Social interactions are fundamental to well-being, yet automatically detecting them in daily life-particularly using wearables-remains underexplored. Most existing systems are evaluated in controlled settings, focus primarily on in-person interactions, or rely on restrictive assumptions (e.g., requiring multiple speakers within fixed temporal windows), limiting generalizability to real-world use. We present an on-watch interaction detection system designed to capture diverse interactions in naturalistic settings. A core component is a foreground speech detector trained on a public dataset. Evaluated on over 100,000 labeled foreground speech and background sound instances, the detector achieves a balanced accuracy of 85.51%, outperforming prior work by 5.11%. We evaluated the system in a real-world deployment (N=38), with over 900 hours of total smartwatch wear time. The system detected 1,691 interactions, 77.28% were confirmed via participant self-report, with durations ranging from under one minute to over one hour. Among correct detections, 81.45% were in-person, 15.7% virtual, and 1.85% hybrid. We further developed a 15-second window-level audio-only model that enables faster interaction prediction, achieving a balanced accuracy of 90.39% and a sensitivity of 91.01% on 33,698 labeled windows. These results demonstrate the feasibility of real-world interaction sensing and open the door to adaptive, context-aware systems responding to users' dynamic social environments.