AILGSEJul 8, 2021

Bootstrapping Generalization of Process Models Discovered From Event Data

arXiv:2107.03876v21 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of quantifying generalization in process mining for industry-scale systems engineering, representing an incremental improvement by adapting existing statistical methods to a specific domain bottleneck.

The paper tackles the problem of estimating generalization in process models, which measures how well a model describes future system executions, by applying a bootstrap approach from computational statistics to define an estimator based on event logs. The result is a consistent estimator that reduces errors as log quality improves, with experiments confirming its applicability to industry-scale data-driven systems engineering.

Process mining extracts value from the traces recorded in the event logs of IT-systems, with process discovery the task of inferring a process model for a log emitted by some unknown system. Generalization is one of the quality criteria applied to process models to quantify how well the model describes future executions of the system. Generalization is also perhaps the least understood of those criteria, with that lack primarily a consequence of it measuring properties over the entire future behavior of the system when the only available sample of behavior is that provided by the log. In this paper, we apply a bootstrap approach from computational statistics, allowing us to define an estimator of the model's generalization based on the log it was discovered from. We show that standard process mining assumptions lead to a consistent estimator that makes fewer errors as the quality of the log increases. Experiments confirm the ability of the approach to support industry-scale data-driven systems engineering.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes