PLG2: Multiperspective Processes Randomization and Simulation for Online and Offline Settings
This work addresses the need for realistic synthetic data in process mining, particularly for evaluating algorithms in business process intelligence, though it is incremental as it builds on prior methods.
The paper extends an existing approach for generating random processes to support multiperspective models and logs, including time and data information, and online settings like event streams and concept drifts, aiming to cover a wide range of real-world scenarios.
Process mining represents an important field in BPM and data mining research. Recently, it has gained importance also for practitioners: more and more companies are creating business process intelligence solutions. The evaluation of process mining algorithms requires, as any other data mining task, the availability of large amount of real-world data. Despite the increasing availability of such datasets, they are affected by many limitations, in primis the absence of a "gold standard" (i.e., the reference model). This paper extends an approach, already available in the literature, for the generation of random processes. Novelties have been introduced throughout the work and, in particular, they involve the complete support for multiperspective models and logs (i.e., the control-flow perspective is enriched with time and data information) and for online settings (i.e., generation of multiperspective event streams and concept drifts). The proposed new framework is able to almost entirely cover the spectrum of possible scenarios that can be observed in the real-world. The proposed approach is implemented as a publicly available Java application, with a set of APIs for the programmatic execution of experiments.