MONAH: Multi-Modal Narratives for Humans to analyze conversations
This work addresses the time-consuming task for researchers analyzing video-recorded conversations by providing an automated tool, though it appears incremental as it builds on existing multimodal annotation methods.
The paper tackles the problem of manually weaving multimodal information into conversation transcripts by introducing an automated system that expands verbatim transcripts using multimodal data streams, resulting in statistically significant improvements in detecting rapport-building.
In conversational analyses, humans manually weave multimodal information into the transcripts, which is significantly time-consuming. We introduce a system that automatically expands the verbatim transcripts of video-recorded conversations using multimodal data streams. This system uses a set of preprocessing rules to weave multimodal annotations into the verbatim transcripts and promote interpretability. Our feature engineering contributions are two-fold: firstly, we identify the range of multimodal features relevant to detect rapport-building; secondly, we expand the range of multimodal annotations and show that the expansion leads to statistically significant improvements in detecting rapport-building.