HC ETMar 31

MURMR: A Multimodal Sensing Framework for Automated Group Behavior Analysis in Mixed Reality

Diana Romero, Yasra Chandio, Fatima Anwar, Salma Elmalaki

arXiv:2507.117976.81 citations

Predicted impact top 91% in HC · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the problem of detecting collaboration breakdowns in immersive environments for teams, offering an automated, annotation-free method, though it appears incremental as it builds on existing sensing and analysis techniques.

The paper tackled the problem of automated group behavior analysis in mixed reality by introducing MURMR, a passive sensing framework that captures multimodal data from headsets without external instrumentation. Results from a deployment with 48 participants showed that intra-session analysis captured significant variability lost in session-level aggregation, and the temporal module identified behavioral phases with 83% correspondence to video observations.

When teams coordinate in immersive environments, collaboration breakdowns can go undetected without automated analysis, directly affecting task performance. Yet existing methods rely on external observation and manual annotation, offering no annotation-free method for analyzing temporal collaboration dynamics from headset-native data. We introduce \sysname, a passive sensing pipeline that captures and analyzes multimodal interaction data from commodity MR headsets without external instrumentation. Two complementary modules address different levels of analysis: a structural module that generates automated multimodal sociograms and network metrics at both session and intra-session granularities, and a temporal module that applies unsupervised deep clustering to identify moment-to-moment dyadic behavioral phases without predefined taxonomies. An exploratory deployment with 48 participants in a co-located object-sorting task reveals that intra-session structural analysis captures significant within-session variability lost in session-level aggregation, with gaze, audio, and position contributing non-redundantly. The temporal module identifies five behavioral phases with 83\% correspondence to video observations. Cross-tabulation shows that behavioral transitions consistently occur within structurally stable states, demonstrating that the two modules capture complementary dynamics. These results establish that passive headset sensing provides meaningful signal for automated, multi-level collaboration analysis in immersive environments.

View on arXiv PDF

Similar