A Deep Learning Approach for Repairing Missing Activity Labels in Event Logs for Process Mining
This work addresses a specific bottleneck in process mining for improving model discovery accuracy, but it is incremental as it builds on prior repair methods with a deep learning approach.
The paper tackles the problem of missing activity labels in event logs for process mining, which hinders process discovery algorithms, by proposing an LSTM-based prediction model that uses prefix and suffix sequences and additional attributes; evaluation on public datasets shows it consistently outperforms existing methods in repairing missing labels.
Process mining is a relatively new subject that builds a bridge between traditional process modeling and data mining. Process discovery is one of the most critical parts of process mining, which aims at discovering process models automatically from event logs. The performance of existing process discovery algorithms can be affected when there are missing activity labels in event logs. Several methods have been proposed to repair missing activity labels, but their accuracy can drop when a large number of activity labels are missing. In this paper, we propose an LSTM-based prediction model to predict the missing activity labels in event logs. The proposed model takes both the prefix and suffix sequences of the events with missing activity labels as input. Additional attributes of event logs are also utilized to improve the performance. Our evaluation of several publicly available datasets shows that the proposed method performed consistently better than existing methods in terms of repairing missing activity labels in event logs.