Restoring Eye Contact to the Virtual Classroom with Machine Learning
This addresses the problem of reduced nonverbal communication, specifically eye contact, for participants in virtual music classrooms, though it is an incremental application of existing methods to a new domain.
The paper tackles the loss of nonverbal communication in virtual music classrooms by developing a machine learning system that estimates a user's gaze target from a single camera frame and overlays it on video feeds to restore eye contact. The system achieved inference times and accuracy suitable for videoconferencing, with pilot studies showing improved cue interpretation and student-reported enhanced collaboration.
Nonverbal communication, in particular eye contact, is a critical element of the music classroom, shown to keep students on task, coordinate musical flow, and communicate improvisational ideas. Unfortunately, this nonverbal aspect to performance and pedagogy is lost in the virtual classroom. In this paper, we propose a machine learning system which uses single instance, single camera image frames as input to estimate the gaze target of a user seated in front of their computer, augmenting the user's video feed with a display of the estimated gaze target and thereby restoring nonverbal communication of directed gaze. The proposed estimation system consists of modular machine learning blocks, leading to a target-oriented (rather than coordinate-oriented) gaze prediction. We instantiate one such example of the complete system to run a pilot study in a virtual music classroom over Zoom software. Inference time and accuracy meet benchmarks for videoconferencing applications, and quantitative and qualitative results of pilot experiments include improved success of cue interpretation and student-reported formation of collaborative, communicative relationships between conductor and musician.