A Discussion on Generalization in Next-Activity Prediction
This work highlights critical flaws in current evaluation practices for next-activity prediction, impacting researchers and practitioners in process mining and predictive analytics by calling for more robust methodologies.
The paper identifies significant example leakage in commonly used event logs for next-activity prediction, showing that trivial approaches perform nearly as well as deep learning methods, and proposes new prediction scenarios to improve generalization in evaluations.
Next activity prediction aims to forecast the future behavior of running process instances. Recent publications in this field predominantly employ deep learning techniques and evaluate their prediction performance using publicly available event logs. This paper presents empirical evidence that calls into question the effectiveness of these current evaluation approaches. We show that there is an enormous amount of example leakage in all of the commonly used event logs, so that rather trivial prediction approaches perform almost as well as ones that leverage deep learning. We further argue that designing robust evaluations requires a more profound conceptual engagement with the topic of next-activity prediction, and specifically with the notion of generalization to new data. To this end, we present various prediction scenarios that necessitate different types of generalization to guide future research.