HC CV LGFeb 17, 2024

Supporting Experts with a Multimodal Machine-Learning-Based Tool for Human Behavior Analysis of Conversational Videos

Riku Arakawa, Kiyosu Maeda, Hiromu Yakura

arXiv:2402.11145v16.76 citations

Originality Synthesis-oriented

AI Analysis

This tool addresses efficiency and objectivity challenges for experts in conversational analysis, though it is incremental as it builds on existing machine learning methods with a new interface.

The researchers tackled the lack of user-friendly tools for multimodal scene search in conversational videos by developing Providence, a visual-programming-based tool that allows experts to combine machine learning algorithms without coding, resulting in preferable usability, satisfactory output with less cognitive load, and confirmed objectivity and reusability in transforming workflows.

Multimodal scene search of conversations is essential for unlocking valuable insights into social dynamics and enhancing our communication. While experts in conversational analysis have their own knowledge and skills to find key scenes, a lack of comprehensive, user-friendly tools that streamline the processing of diverse multimodal queries impedes efficiency and objectivity. To solve it, we developed Providence, a visual-programming-based tool based on design considerations derived from a formative study with experts. It enables experts to combine various machine learning algorithms to capture human behavioral cues without writing code. Our study showed its preferable usability and satisfactory output with less cognitive load imposed in accomplishing scene search tasks of conversations, verifying the importance of its customizability and transparency. Furthermore, through the in-the-wild trial, we confirmed the objectivity and reusability of the tool transform experts' workflow, suggesting the advantage of expert-AI teaming in a highly human-contextual domain.

View on arXiv PDF

Similar