LGMMAug 22, 2024

Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review

arXiv:2408.14491v25 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This is an incremental review that synthesizes existing research to guide practitioners and researchers in multimodal educational technology.

This paper tackles the lack of a comprehensive review of empirical methods in applied multimodal learning and training environments by introducing a taxonomy and framework, revealing that integrating modalities enables richer insights into learner behaviors but faces challenges in data collection and integration for real-time classroom use.

Recent technological advancements in multimodal machine learning--including the rise of large language models (LLMs)--have improved our ability to collect, process, and analyze diverse multimodal data such as speech, video, and eye gaze in learning and training contexts. While prior reviews have addressed individual components of the multimodal pipeline (e.g., conceptual models, data fusion), a comprehensive review of empirical methods in applied multimodal environments remains notably absent. This review addresses that, introducing a taxonomy and framework that capture both established practices and recent innovations driven by LLMs and generative AI. We identify five modality groups: Natural Language, Vision, Physiological Signals, Human-Centered Evidence, and Environment Logs. Our analysis reveals that integrating modalities enables richer insights into learner and trainee behaviors, revealing latent patterns often overlooked by unimodal approaches. However, persistent challenges in multimodal data collection and integration continue to hinder the adoption of these systems in real-time classroom settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes