HCCVMar 14

Toward Scalable Co-located Practical Learning: Assisting with Computer Vision and Multimodal Analytics

arXiv:2603.1367933.5h-index: 21
AI Analysis

This work addresses the challenge of scalable, non-invasive monitoring of teamwork and engagement in practical learning environments like nursing simulations, offering an incremental improvement over sensor-based methods.

The study tackled the problem of analyzing fine-grained learning behaviors in co-located practical settings by using a single ceiling-mounted camera and a YOLO-based detector, achieving high annotation reliability (F1=0.933) and model performance (mAP@0.5 of 0.827). It found that combining behavior labels with spatial context revealed clear differences between high- and low-performing teams, such as more patient interaction in primary areas for higher performers.

This study examined whether a single ceiling-mounted camera could be used to capture fine-grained learning behaviours in co-located practical learning. In undergraduate nursing simulations, teachers first identified seven observable behaviour categories, which were then used to train a YOLO-based detector. Video data were collected from 52 sessions, and analyses focused on Scenario A because it produced greater behavioural variation than Scenario B. Annotation reliability was high (F1=0.933). On the held-out test set, the model achieved a precision of 0.789, a recall of 0.784, and an mAP@0.5 of 0.827. When only behaviour frequencies were compared, no robust differences were found between high- and low-performing groups. However, when behaviour labels were analysed together with spatial context, clear differences emerged in both task and collaboration performance. Higher-performing teams showed more patient interaction in the primary work area, whereas lower-performing teams showed more phone-related activity and more activity in secondary areas. These findings suggest that behavioural data are more informative when interpreted together with where they occur. Overall, the study shows that a single-camera computer vision approach can support the analysis of teamwork and task engagement in face-to-face practical learning without relying on wearable sensors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes