All That Glitters Is Not Gold: Towards Process Discovery Techniques with Guarantees
This paper addresses a foundational problem for process mining researchers and practitioners by exposing a lack of reliability guarantees in current process discovery techniques.
This paper highlights a critical flaw in existing process discovery algorithms: they do not guarantee that higher quality input event data leads to higher quality discovered process models. The authors demonstrate this disconnect using various quality measures for both data and models, urging the community to develop algorithms with input-output quality guarantees.
The aim of a process discovery algorithm is to construct from event data a process model that describes the underlying, real-world process well. Intuitively, the better the quality of the event data, the better the quality of the model that is discovered. However, existing process discovery algorithms do not guarantee this relationship. We demonstrate this by using a range of quality measures for both event data and discovered process models. This paper is a call to the community of IS engineers to complement their process discovery algorithms with properties that relate qualities of their inputs to those of their outputs. To this end, we distinguish four incremental stages for the development of such algorithms, along with concrete guidelines for the formulation of relevant properties and experimental validation. We will also use these stages to reflect on the state of the art, which shows the need to move forward in our thinking about algorithmic process discovery.