Human-Centered Evaluation of an LLM-Based Process Modeling Copilot: A Mixed-Methods Study with Domain Experts

Chantale Lauer, Peter Pfeiffer, Nijat Mehdiyev

arXiv:2603.1289556.9

AI Analysis

This addresses the problem of human factors like trust and usability in LLM-based modeling tools for business process management, but it is incremental as it focuses on evaluation rather than novel method development.

The study tackled the integration of LLMs into business process modeling tools by evaluating an LLM-powered BPMN copilot with domain experts, revealing a tension between acceptable usability (mean CUQ score: 67.2/100) and low trust (mean score: 48.8%) with reliability as a key concern.

Integrating Large Language Models (LLMs) into business process management tools promises to democratize Business Process Model and Notation (BPMN) modeling for non-experts. While automated frameworks assess syntactic and semantic quality, they miss human factors like trust, usability, and professional alignment. We conducted a mixed-methods evaluation of our proposed solution, an LLM-powered BPMN copilot, with five process modeling experts using focus groups and standardized questionnaires. Our findings reveal a critical tension between acceptable perceived usability (mean CUQ score: 67.2/100) and notably lower trust (mean score: 48.8\%), with reliability rated as the most critical concern (M=1.8/5). Furthermore, we identified output-quality issues, prompting difficulties, and a need for the LLM to ask more in-depth clarifying questions about the process. We envision five use cases ranging from domain-expert support to enterprise quality assurance. We demonstrate the necessity of human-centered evaluation complementing automated benchmarking for LLM modeling agents.

View on arXiv PDF

Similar