LGDBJan 5, 2023

Trace Encoding in Process Mining: a survey and benchmarking

arXiv:2301.02167v135 citationsh-index: 30
Originality Synthesis-oriented
AI Analysis

It addresses the issue of unfair comparisons and performance gaps in process mining for researchers and practitioners, though it is incremental as it builds on existing methods through systematic evaluation.

This paper tackles the problem of arbitrary and suboptimal use of encoding methods in process mining tasks by conducting a comprehensive survey and benchmarking of 27 trace encoding methods, evaluating them based on expressivity, scalability, correlation, and domain agnosticism.

Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for transforming complex information into a numerical feature space. Most papers choose existing encoding methods arbitrarily or employ a strategy based on a specific expert knowledge domain. Moreover, existing methods are employed by using their default hyperparameters without evaluating other options. This practice can lead to several drawbacks, such as suboptimal performance and unfair comparisons with the state-of-the-art. Therefore, this work aims at providing a comprehensive survey on event log encoding by comparing 27 methods, from different natures, in terms of expressivity, scalability, correlation, and domain agnosticism. To the best of our knowledge, this is the most comprehensive study so far focusing on trace encoding in process mining. It contributes to maturing awareness about the role of trace encoding in process mining pipelines and sheds light on issues, concerns, and future research directions regarding the use of encoding methods to bridge the gap between machine learning models and process mining.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes