LGMar 28, 2023
CREATED: Generating Viable Counterfactual Sequences for Predictive Process AnalyticsOlusanmi Hundogan, Xixi Lu, Yupei Du et al.
Predictive process analytics focuses on predicting future states, such as the outcome of running process instances. These techniques often use machine learning models or deep learning models (such as LSTM) to make such predictions. However, these deep models are complex and difficult for users to understand. Counterfactuals answer ``what-if'' questions, which are used to understand the reasoning behind the predictions. For example, what if instead of emailing customers, customers are being called? Would this alternative lead to a different outcome? Current methods to generate counterfactual sequences either do not take the process behavior into account, leading to generating invalid or infeasible counterfactual process instances, or heavily rely on domain knowledge. In this work, we propose a general framework that uses evolutionary methods to generate counterfactual sequences. Our framework does not require domain knowledge. Instead, we propose to train a Markov model to compute the feasibility of generated counterfactual sequences and adapt three other measures (delta in outcome prediction, similarity, and sparsity) to ensure their overall viability. The evaluation shows that we generate viable counterfactual sequences, outperform baseline methods in viability, and yield similar results when compared to the state-of-the-art method that requires domain knowledge.
LGMar 17, 2022
The Analysis of Online Event Streams: Predicting the Next Activity for Anomaly DetectionSuhwan Lee, Xixi Lu, Hajo A. Reijers
Anomaly detection in process mining focuses on identifying anomalous cases or events in process executions. The resulting diagnostics are used to provide measures to prevent fraudulent behavior, as well as to derive recommendations for improving process compliance and security. Most existing techniques focus on detecting anomalous cases in an offline setting. However, to identify potential anomalies in a timely manner and take immediate countermeasures, it is necessary to detect event-level anomalies online, in real-time. In this paper, we propose to tackle the online event anomaly detection problem using next-activity prediction methods. More specifically, we investigate the use of both ML models (such as RF and XGBoost) and deep models (such as LSTM) to predict the probabilities of next-activities and consider the events predicted unlikely as anomalies. We compare these predictive anomaly detection methods to four classical unsupervised anomaly detection approaches (such as Isolation forest and LOF) in the online setting. Our evaluation shows that the proposed method using ML models tends to outperform the one using a deep model, while both methods outperform the classical unsupervised approaches in detecting anomalous events.
AIAug 28, 2023
Interactive Multi Interest Process Pattern DiscoveryMozhgan Vazifehdoostirani, Laura Genga, Xixi Lu et al.
Process pattern discovery methods (PPDMs) aim at identifying patterns of interest to users. Existing PPDMs typically are unsupervised and focus on a single dimension of interest, such as discovering frequent patterns. We present an interactive multi interest driven framework for process pattern discovery aimed at identifying patterns that are optimal according to a multi-dimensional analysis goal. The proposed approach is iterative and interactive, thus taking experts knowledge into account during the discovery process. The paper focuses on a concrete analysis goal, i.e., deriving process patterns that affect the process outcome. We evaluate the approach on real world event logs in both interactive and fully automated settings. The approach extracted meaningful patterns validated by expert knowledge in the interactive setting. Patterns extracted in the automated settings consistently led to prediction performance comparable to or better than patterns derived considering single interest dimensions without requiring user defined thresholds.
LGOct 13, 2023
Measuring the Stability of Process Outcome Predictions in Online SettingsSuhwan Lee, Marco Comuzzi, Xixi Lu et al.
Predictive Process Monitoring aims to forecast the future progress of process instances using historical event data. As predictive process monitoring is increasingly applied in online settings to enable timely interventions, evaluating the performance of the underlying models becomes crucial for ensuring their consistency and reliability over time. This is especially important in high risk business scenarios where incorrect predictions may have severe consequences. However, predictive models are currently usually evaluated using a single, aggregated value or a time-series visualization, which makes it challenging to assess their performance and, specifically, their stability over time. This paper proposes an evaluation framework for assessing the stability of models for online predictive process monitoring. The framework introduces four performance meta-measures: the frequency of significant performance drops, the magnitude of such drops, the recovery rate, and the volatility of performance. To validate this framework, we applied it to two artificial and two real-world event logs. The results demonstrate that these meta-measures facilitate the comparison and selection of predictive models for different risk-taking scenarios. Such insights are of particular value to enhance decision-making in dynamic business environments.
DBOct 16, 2020Code
Discovering Hierarchical Processes Using Flexible Activity Trees for Event AbstractionXixi Lu, Avigdor Gal, Hajo A. Reijers
Processes, such as patient pathways, can be very complex, comprising of hundreds of activities and dozens of interleaved subprocesses. While existing process discovery algorithms have proven to construct models of high quality on clean logs of structured processes, it still remains a challenge when the algorithms are being applied to logs of complex processes. The creation of a multi-level, hierarchical representation of a process can help to manage this complexity. However, current approaches that pursue this idea suffer from a variety of weaknesses. In particular, they do not deal well with interleaving subprocesses. In this paper, we propose FlexHMiner, a three-step approach to discover processes with multi-level interleaved subprocesses. We implemented FlexHMiner in the open source Process Mining toolkit ProM. We used seven real-life logs to compare the qualities of hierarchical models discovered using domain knowledge, random clustering, and flat approaches. Our results indicate that the hierarchical process models that the FlexHMiner generates compare favorably to approaches that do not exploit hierarchy.
AIOct 2, 2023
Using Reinforcement Learning to Optimize Responses in Care Processes: A Case Study on Aggression IncidentsBart J. Verhoef, Xixi Lu
Previous studies have used prescriptive process monitoring to find actionable policies in business processes and conducted case studies in similar domains, such as the loan application process and the traffic fine process. However, care processes tend to be more dynamic and complex. For example, at any stage of a care process, a multitude of actions is possible. In this paper, we follow the reinforcement approach and train a Markov decision process using event data from a care process. The goal was to find optimal policies for staff members when clients are displaying any type of aggressive behavior. We used the reinforcement learning algorithms Q-learning and SARSA to find optimal policies. Results showed that the policies derived from these algorithms are similar to the most frequent actions currently used but provide the staff members with a few more options in certain situations.
LGApr 8, 2024
HOEG: A New Approach for Object-Centric Predictive Process MonitoringTim K. Smit, Hajo A. Reijers, Xixi Lu
Predictive Process Monitoring focuses on predicting future states of ongoing process executions, such as forecasting the remaining time. Recent developments in Object-Centric Process Mining have enriched event data with objects and their explicit relations between events. To leverage this enriched data, we propose the Heterogeneous Object Event Graph encoding (HOEG), which integrates events and objects into a graph structure with diverse node types. It does so without aggregating object features, thus creating a more nuanced and informative representation. We then adopt a heterogeneous Graph Neural Network architecture, which incorporates these diverse object features in prediction tasks. We evaluate the performance and scalability of HOEG in predicting remaining time, benchmarking it against two established graph-based encodings and two baseline models. Our evaluation uses three Object-Centric Event Logs (OCELs), including one from a real-life process at a major Dutch financial institution. The results indicate that HOEG competes well with existing models and surpasses them when OCELs contain informative object attributes and event-object interactions.
LGAug 26, 2025
Working My Way Back to You: Resource-Centric Next-Activity PredictionKelly Kurowski, Xixi Lu, Hajo A Reijers
Predictive Process Monitoring (PPM) aims to train models that forecast upcoming events in process executions. These predictions support early bottleneck detection, improved scheduling, proactive interventions, and timely communication with stakeholders. While existing research adopts a control-flow perspective, we investigate next-activity prediction from a resource-centric viewpoint, which offers additional benefits such as improved work organization, workload balancing, and capacity forecasting. Although resource information has been shown to enhance tasks such as process performance analysis, its role in next-activity prediction remains unexplored. In this study, we evaluate four prediction models and three encoding strategies across four real-life datasets. Compared to the baseline, our results show that LightGBM and Transformer models perform best with an encoding based on 2-gram activity transitions, while Random Forest benefits most from an encoding that combines 2-gram transitions and activity repetition features. This combined encoding also achieves the highest average accuracy. This resource-centric approach could enable smarter resource allocation, strategic workforce planning, and personalized employee support by analyzing individual behavior rather than case-level progression. The findings underscore the potential of resource-centric next-activity prediction, opening up new venues for research on PPM.
CRJul 8, 2025
The Impact of Event Data Partitioning on Privacy-aware Process DiscoveryJungeun Lim, Stephan A. Fahrenkrog-Petersen, Xixi Lu et al.
Information systems support the execution of business processes. The event logs of these executions generally contain sensitive information about customers, patients, and employees. The corresponding privacy challenges can be addressed by anonymizing the event logs while still retaining utility for process discovery. However, trading off utility and privacy is difficult: the higher the complexity of event log, the higher the loss of utility by anonymization. In this work, we propose a pipeline that combines anonymization and event data partitioning, where event abstraction is utilized for partitioning. By leveraging event abstraction, event logs can be segmented into multiple parts, allowing each sub-log to be anonymized separately. This pipeline preserves privacy while mitigating the loss of utility. To validate our approach, we study the impact of event partitioning on two anonymization techniques using three real-world event logs and two process discovery techniques. Our results demonstrate that event partitioning can bring improvements in process discovery utility for directly-follows-based anonymization techniques.
AIJun 19, 2025
The Role of Explanation Styles and Perceived Accuracy on Decision Making in Predictive Process MonitoringSoobin Chae, Suhwan Lee, Hanna Hauptmann et al.
Predictive Process Monitoring (PPM) often uses deep learning models to predict the future behavior of ongoing processes, such as predicting process outcomes. While these models achieve high accuracy, their lack of interpretability undermines user trust and adoption. Explainable AI (XAI) aims to address this challenge by providing the reasoning behind the predictions. However, current evaluations of XAI in PPM focus primarily on functional metrics (such as fidelity), overlooking user-centered aspects such as their effect on task performance and decision-making. This study investigates the effects of explanation styles (feature importance, rule-based, and counterfactual) and perceived AI accuracy (low or high) on decision-making in PPM. We conducted a decision-making experiment, where users were presented with the AI predictions, perceived accuracy levels, and explanations of different styles. Users' decisions were measured both before and after receiving explanations, allowing the assessment of objective metrics (Task Performance and Agreement) and subjective metrics (Decision Confidence). Our findings show that perceived accuracy and explanation style have a significant effect.
AIJun 10, 2025
FoldA: Computing Partial-Order Alignments Using Directed Net UnfoldingsDouwe Geurtjens, Xixi Lu
Conformance checking is a fundamental task of process mining, which quantifies the extent to which the observed process executions match a normative process model. The state-of-the-art approaches compute alignments by exploring the state space formed by the synchronous product of the process model and the trace. This often leads to state space explosion, particularly when the model exhibits a high degree of choice and concurrency. Moreover, as alignments inherently impose a sequential structure, they fail to fully represent the concurrent behavior present in many real-world processes. To address these limitations, this paper proposes a new technique for computing partial-order alignments {on the fly using directed Petri net unfoldings, named FoldA. We evaluate our technique on 485 synthetic model-log pairs and compare it against Astar- and Dijkstra-alignments on 13 real-life model-log pairs and 6 benchmark pairs. The results show that our unfolding alignment, although it requires more computation time, generally reduces the number of queued states and provides a more accurate representation of concurrency.
DBApr 3, 2019
Identifying Patient Groups based on Frequent Patterns of Patient SamplesSeyed Amin Tabatabaei, Xixi Lu, Mark Hoogendoorn et al.
Grouping patients meaningfully can give insights about the different types of patients, their needs, and the priorities. Finding groups that are meaningful is however very challenging as background knowledge is often required to determine what a useful grouping is. In this paper we propose an approach that is able to find groups of patients based on a small sample of positive examples given by a domain expert. Because of that, the approach relies on very limited efforts by the domain experts. The approach groups based on the activities and diagnostic/billing codes within health pathways of patients. To define such a grouping based on the sample of patients efficiently, frequent patterns of activities are discovered and used to measure the similarity between the care pathways of other patients to the patients in the sample group. This approach results in an insightful definition of the group. The proposed approach is evaluated using several datasets obtained from a large university medical center. The evaluation shows F1-scores of around 0.7 for grouping kidney injury and around 0.6 for diabetes.
DBMay 3, 2017
The Imprecisions of Precision Measures in Process MiningNiek Tax, Xixi Lu, Natalia Sidorova et al.
In process mining, precision measures are used to quantify how much a process model overapproximates the behavior seen in an event log. Although several measures have been proposed throughout the years, no research has been done to validate whether these measures achieve the intended aim of quantifying over-approximation in a consistent way for all models and logs. This paper fills this gap by postulating a number of axioms for quantifying precision consistently for any log and any model. Further, we show through counter-examples that none of the existing measures consistently quantifies precision.