A Framework for Extracting and Encoding Features from Object-Centric Event Data
This addresses the need for accurate feature extraction in process mining for domains with complex, multi-object event data, though it is incremental as it builds on existing encodings while adding a new graph-based approach.
The paper tackles the problem of extracting features from object-centric event data, which traditional process mining techniques cannot handle without lossy flattening, by introducing a framework that calculates features natively and provides three encodings, including a novel graph-based one, with use cases demonstrating utility in visualization and prediction.
Traditional process mining techniques take event data as input where each event is associated with exactly one object. An object represents the instantiation of a process. Object-centric event data contain events associated with multiple objects expressing the interaction of multiple processes. As traditional process mining techniques assume events associated with exactly one object, these techniques cannot be applied to object-centric event data. To use traditional process mining techniques, the object-centric event data are flattened by removing all object references but one. The flattening process is lossy, leading to inaccurate features extracted from flattened data. Furthermore, the graph-like structure of object-centric event data is lost when flattening. In this paper, we introduce a general framework for extracting and encoding features from object-centric event data. We calculate features natively on the object-centric event data, leading to accurate measures. Furthermore, we provide three encodings for these features: tabular, sequential, and graph-based. While tabular and sequential encodings have been heavily used in process mining, the graph-based encoding is a new technique preserving the structure of the object-centric event data. We provide six use cases: a visualization and a prediction use case for each of the three encodings. We use explainable AI in the prediction use cases to show the utility of both the object-centric features and the structure of the sequential and graph-based encoding for a predictive model.