CLLGSEMar 14, 2025

Evaluating the Process Modeling Abilities of Large Language Models -- Preliminary Foundations and Results

arXiv:2503.13520v11 citationsh-index: 36
Originality Synthesis-oriented
AI Analysis

This work addresses the evaluation challenges for researchers and practitioners using LLMs in process modeling, but it is incremental as it primarily discusses existing issues without introducing new methods.

The paper tackles the problem of evaluating the process modeling abilities of large language models, arguing that current benchmarks are insufficient and must account for factors like quality, cost, time, and Pareto-optimal variants, but it does not present specific results or numbers.

Large language models (LLM) have revolutionized the processing of natural language. Although first benchmarks of the process modeling abilities of LLM are promising, it is currently under debate to what extent an LLM can generate good process models. In this contribution, we argue that the evaluation of the process modeling abilities of LLM is far from being trivial. Hence, available evaluation results must be taken carefully. For example, even in a simple scenario, not only the quality of a model should be taken into account, but also the costs and time needed for generation. Thus, an LLM does not generate one optimal solution, but a set of Pareto-optimal variants. Moreover, there are several further challenges which have to be taken into account, e.g. conceptualization of quality, validation of results, generalizability, and data leakage. We discuss these challenges in detail and discuss future experiments to tackle these challenges scientifically.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes