A Multimodal Machine Learning Framework for Teacher Vocal Delivery Evaluation
This addresses the need for objective, scalable evaluation of teacher performance in education, though it appears incremental as it applies existing multimodal techniques to a specific domain.
The paper tackles the problem of subjective and time-consuming manual evaluation of teacher vocal delivery by developing a multimodal machine learning framework that uses pairwise comparisons and orthogonal fusing to assess fluency and passion, demonstrating effectiveness on real-world datasets.
The quality of vocal delivery is one of the key indicators for evaluating teacher enthusiasm, which has been widely accepted to be connected to the overall course qualities. However, existing evaluation for vocal delivery is mainly conducted with manual ratings, which faces two core challenges: subjectivity and time-consuming. In this paper, we present a novel machine learning approach that utilizes pairwise comparisons and a multimodal orthogonal fusing algorithm to generate large-scale objective evaluation results of the teacher vocal delivery in terms of fluency and passion. We collect two datasets from real-world education scenarios and the experiment results demonstrate the effectiveness of our algorithm. To encourage reproducible results, we make our code public available at \url{https://github.com/tal-ai/ML4VocalDelivery.git}.