LGDec 13, 2022

Can recurrent neural networks learn process model structure?

Jari Peeperkorn, Seppe vanden Broucke, Jochen De Weerdt

arXiv:2212.06430v15.819 citationsh-index: 31Has Code

Originality Incremental advance

AI Analysis

This addresses a fundamental limitation in predictive process monitoring for researchers and practitioners, showing incremental insights into model learning capabilities.

The study investigated whether LSTMs can learn the underlying process model structure from event logs, finding that they often struggle even with simple data and lenient setups, though anti-overfitting measures can help but are not optimal when selected based on prediction accuracy alone.

Various methods using machine and deep learning have been proposed to tackle different tasks in predictive process monitoring, forecasting for an ongoing case e.g. the most likely next event or suffix, its remaining time, or an outcome-related variable. Recurrent neural networks (RNNs), and more specifically long short-term memory nets (LSTMs), stand out in terms of popularity. In this work, we investigate the capabilities of such an LSTM to actually learn the underlying process model structure of an event log. We introduce an evaluation framework that combines variant-based resampling and custom metrics for fitness, precision and generalization. We evaluate 4 hypotheses concerning the learning capabilities of LSTMs, the effect of overfitting countermeasures, the level of incompleteness in the training set and the level of parallelism in the underlying process model. We confirm that LSTMs can struggle to learn process model structure, even with simplistic process data and in a very lenient setup. Taking the correct anti-overfitting measures can alleviate the problem. However, these measures did not present themselves to be optimal when selecting hyperparameters purely on predicting accuracy. We also found that decreasing the amount of information seen by the LSTM during training, causes a sharp drop in generalization and precision scores. In our experiments, we could not identify a relationship between the extent of parallelism in the model and the generalization capability, but they do indicate that the process' complexity might have impact.

View on arXiv PDF Code

Similar