Ranking the Top-K Realizations of Stochastically Known Event Logs
This work addresses uncertainty in process mining for domains with flawed data recording, though it is incremental as it builds on existing top-K proposals.
The paper tackles the problem of efficiently ranking the top-K most probable realizations in stochastically known event logs, which encode uncertainties, by implementing an O(Kn) algorithm and showing that the benefit of such rankings depends on log length and event probability distributions.
Various kinds of uncertainty can occur in event logs, e.g., due to flawed recording, data quality issues, or the use of probabilistic models for activity recognition. Stochastically known event logs make these uncertainties transparent by encoding multiple possible realizations for events. However, the number of realizations encoded by a stochastically known log grows exponentially with its size, making exhaustive exploration infeasible even for moderately sized event logs. Thus, considering only the top-K most probable realizations has been proposed in the literature. In this paper, we implement an efficient algorithm to calculate a top-K realization ranking of an event log under event independence within O(Kn), where n is the number of uncertain events in the log. This algorithm is used to investigate the benefit of top-K rankings over top-1 interpretations of stochastically known event logs. Specifically, we analyze the usefulness of top-K rankings against different properties of the input data. We show that the benefit of a top-K ranking depends on the length of the input event log and the distribution of the event probabilities. The results highlight the potential of top-K rankings to enhance uncertainty-aware process mining techniques.