Abel Armas-Cervantes

CR
6papers
40citations
Novelty44%
AI Score41

6 Papers

AIMar 26, 2022
Generalization in Automated Process Discovery: A Framework based on Event Log Patterns

Daniel Reißner, Abel Armas-Cervantes, Marcello La Rosa

The importance of quality measures in process mining has increased. One of the key quality aspects, generalization, is concerned with measuring the degree of overfitting of a process model w.r.t. an event log, since the recorded behavior is just an example of the true behavior of the underlying business process. Existing generalization measures exhibit several shortcomings that severely hinder their applicability in practice. For example, they assume the event log fully fits the discovered process model, and cannot deal with large real-life event logs and complex process models. More significantly, current measures neglect generalizations for clear patterns that demand a certain construct in the model. For example, a repeating sequence in an event log should be generalized with a loop structure in the model. We address these shortcomings by proposing a framework of measures that generalize a set of patterns discovered from an event log with representative traces and check the corresponding control-flow structures in the process model via their trace alignment. We instantiate the framework with a generalization measure that uses tandem repeats to identify repetitive patterns that are compared to the loop structures and a concurrency oracle to identify concurrent patterns that are compared to the parallel structures of the process model. In an extensive qualitative and quantitative evaluation using 74 log-model pairs using against two baseline generalization measures, we show that the proposed generalization measure consistently ranks process models that fulfil the observed patterns with generalizing control-flow structures higher than those which do not, while the baseline measures disregard those patterns. Further, we show that our measure can be efficiently computed for datasets two orders of magnitude larger than the largest dataset the baseline generalization measures can handle.

6.1CRApr 28
Secure Conformance Checking using Token-based Replay and Homomorphic Encryption

Luis-Armando Rodríguez-Flores, Luciano García-Bañuelos, Abel Armas-Cervantes et al.

Conformance checking, one of the main process mining operations, aims to identify discrepancies between a process model and an event log. The model represents the expected behaviour, whereas the event log represents the actual process behaviour as captured in information systems' records. Traditionally, the process model and the event log are both accessible to the business analyst performing the conformance checking. However, in some contexts the log's owner may want to protect critical or sensitive information in the log and still check its conformance with respect to a model belonging to another party. In this paper, we propose a secure approach to conformance checking based on the well-known token-based replay algorithm and homomorphic encryption. An evaluation is performed using a synthetic log, showing the practicality of the proposed technique.

6.5CRApr 30
A Privacy-Preserving Approach to Conformance Checking

Luis Rodríguez-Flores, Luciano García-Bañuelos, Abel Armas-Cervantes et al.

Conformance checking, one of the main process mining operations, aims to identify discrepancies between a process model and an event log. The model represents the expected behaviour, whereas the event log represents the actual process behaviour as captured in information systems records. Traditionally, the process model and the event log are both accessible to the business analyst performing the conformance checking. However, in some contexts, it is necessary to keep either the model or the log private to protect critical or sensitive information. In this paper, we propose a secure approach to conformance checking based on string processing algorithms and homomorphic encryption, where the process model and event log ar not visible to either the model's or event log's owner. The proposed technique is based on alignments, a well-known formalism used for conformance checking. An evaluation is performed using a synthetic and a real-world event log, showing that conformance checking can be securely computed at the expense of high memory and processing requirements.

SEApr 2, 2020
Efficient Conformance Checking using Approximate Alignment Computation with Tandem Repeats

Daniel Reißner, Abel Armas-Cervantes, Marcello La Rosa

Conformance checking encompasses a body of process mining techniques which aim to find and describe the differences between a process model capturing the expected process behavior and a corresponding event log recording the observed behavior. Alignments are an established technique to compute the distance between a trace in the event log and the closest execution trace of a corresponding process model. Given a cost function, an alignment is optimal when it contains the least number of mismatches between a log trace and a model trace. Determining optimal alignments, however, is computationally expensive, especially in light of the growing size and complexity of event logs from practice, which can easily exceed one million events with traces of several hundred activities. A common limitation of existing alignment techniques is the inability to exploit repetitions in the log. By exploiting a specific form of sequential pattern in traces, namely tandem repeats, we propose a novel approximate technique that uses pre- and post-processing steps to compress the length of a trace and recomputes the alignment cost while guaranteeing that the cost result never under-approximates the optimal cost. In an extensive empirical evaluation with 50 real-life model log pairs and against six state-of-the-art alignment techniques, we show that the proposed compression approach systematically outperforms the baselines by up to an order of magnitude in the presence of traces with repetitions, and that the cost over-approximation, when it occurs, is negligible.

SEOct 22, 2019
Scalable Alignment of Process Models and Event Logs: An Approach Based on Automata and S-Components

Daniel Reißner, Abel Armas-Cervantes, Raffaele Conforti et al.

Given a model of the expected behavior of a business process and an event log recording its observed behavior, the problem of business process conformance checking is that of identifying and describing the differences between the model and the log. A desirable feature of a conformance checking technique is to identify a minimal yet complete set of differences. Existing conformance checking techniques that fulfil this property exhibit limited scalability when confronted to large and complex models and logs. This paper presents two complementary techniques to address these shortcomings. The first technique transforms the model and log into two automata. These automata are compared using an error-correcting synchronized product, computed via an A* that guarantees the resulting automaton captures all differences with a minimal amount of error corrections. The synchronized product is used to extract minimal-length alignments between each trace of the log and the closest corresponding trace of the model. A limitation of the first technique is that as the level of concurrency in the model increases, the size of the automaton of the model grows exponentially, thus hampering scalability. To address this limitation, the paper proposes a second technique wherein the process model is first decomposed into a set of automata, known as S-components, such that the product of these automata is equal to the automaton of the whole process model. An error-correcting product is computed for each S-component separately and the resulting automata are recomposed into a single product automaton capturing all differences without minimality guarantees. An empirical evaluation shows that the proposed techniques outperform state-of-the-art baselines in terms of computational efficiency. Moreover, the decomposition-based technique is optimal for the vast majority of datasets and quasi-optimal for the remaining ones.

LGApr 8, 2018
Discovering Process Maps from Event Streams

Volodymyr Leno, Abel Armas-Cervantes, Marlon Dumas et al.

Automated process discovery is a class of process mining methods that allow analysts to extract business process models from event logs. Traditional process discovery methods extract process models from a snapshot of an event log stored in its entirety. In some scenarios, however, events keep coming with a high arrival rate to the extent that it is impractical to store the entire event log and to continuously re-discover a process model from scratch. Such scenarios require online process discovery approaches. Given an event stream produced by the execution of a business process, the goal of an online process discovery method is to maintain a continuously updated model of the process with a bounded amount of memory while at the same time achieving similar accuracy as offline methods. However, existing online discovery approaches require relatively large amounts of memory to achieve levels of accuracy comparable to that of offline methods. Therefore, this paper proposes an approach that addresses this limitation by mapping the problem of online process discovery to that of cache memory management, and applying well-known cache replacement policies to the problem of online process discovery. The approach has been implemented in .NET, experimentally integrated with the Minit process mining tool and comparatively evaluated against an existing baseline using real-life datasets.