AISep 18, 2025

Knowledge-Driven Hallucination in Large Language Models: An Empirical Study on Process Modeling

Humam Kourani, Anton Antonov, Alessandro Berti, Wil M. P. van der Aalst

arXiv:2509.15336v11 citationsh-index: 10

Originality Incremental advance

AI Analysis

This study addresses a critical reliability issue for users in evidence-based domains like Business Process Management, highlighting the need for rigorous validation of AI-generated artifacts.

The paper investigates knowledge-driven hallucination in Large Language Models (LLMs), where outputs contradict explicit source evidence due to internal knowledge, by evaluating LLMs on automated process modeling tasks with controlled conflicts between evidence and background knowledge.

The utility of Large Language Models (LLMs) in analytical tasks is rooted in their vast pre-trained knowledge, which allows them to interpret ambiguous inputs and infer missing information. However, this same capability introduces a critical risk of what we term knowledge-driven hallucination: a phenomenon where the model's output contradicts explicit source evidence because it is overridden by the model's generalized internal knowledge. This paper investigates this phenomenon by evaluating LLMs on the task of automated process modeling, where the goal is to generate a formal business process model from a given source artifact. The domain of Business Process Management (BPM) provides an ideal context for this study, as many core business processes follow standardized patterns, making it likely that LLMs possess strong pre-trained schemas for them. We conduct a controlled experiment designed to create scenarios with deliberate conflict between provided evidence and the LLM's background knowledge. We use inputs describing both standard and deliberately atypical process structures to measure the LLM's fidelity to the provided evidence. Our work provides a methodology for assessing this critical reliability issue and raises awareness of the need for rigorous validation of AI-generated artifacts in any evidence-based domain.

View on arXiv PDF

Similar