Majid Rafiei

CR
h-index13
12papers
189citations
Novelty33%
AI Score26

12 Papers

LGMar 29, 2023
TraVaG: Differentially Private Trace Variant Generation Using GANs

Majid Rafiei, Frederik Wangelik, Mahsa Pourbafrani et al.

Process mining is rapidly growing in the industry. Consequently, privacy concerns regarding sensitive and private information included in event data, used by process mining algorithms, are becoming increasingly relevant. State-of-the-art research mainly focuses on providing privacy guarantees, e.g., differential privacy, for trace variants that are used by the main process mining techniques, e.g., process discovery. However, privacy preservation techniques for releasing trace variants still do not fulfill all the requirements of industry-scale usage. Moreover, providing privacy guarantees when there exists a high rate of infrequent trace variants is still a challenge. In this paper, we introduce TraVaG as a new approach for releasing differentially private trace variants based on \text{Generative Adversarial Networks} (GANs) that provides industry-scale benefits and enhances the level of privacy guarantees when there exists a high ratio of infrequent variants. Moreover, TraVaG overcomes shortcomings of conventional privacy preservation techniques such as bounding the length of variants and introducing fake variants. Experimental results on real-life event data show that our approach outperforms state-of-the-art techniques in terms of privacy guarantees, plain data utility preservation, and result utility preservation.

LGOct 4, 2023
Extracting Rules from Event Data for Study Planning

Majid Rafiei, Duygu Bayrak, Mahsa Pourbafrani et al.

In this study, we examine how event data from campus management systems can be used to analyze the study paths of higher education students. The main goal is to offer valuable guidance for their study planning. We employ process and data mining techniques to explore the impact of sequences of taken courses on academic success. Through the use of decision tree models, we generate data-driven recommendations in the form of rules for study planning and compare them to the recommended study plan. The evaluation focuses on RWTH Aachen University computer science bachelor program students and demonstrates that the proposed course sequence features effectively explain academic performance measures. Furthermore, the findings suggest avenues for developing more adaptable study plans.

LGApr 8, 2025
Releasing Differentially Private Event Logs Using Generative Models

Frederik Wangelik, Majid Rafiei, Mahsa Pourbafrani et al.

In recent years, the industry has been witnessing an extended usage of process mining and automated event data analysis. Consequently, there is a rising significance in addressing privacy apprehensions related to the inclusion of sensitive and private information within event data utilized by process mining algorithms. State-of-the-art research mainly focuses on providing quantifiable privacy guarantees, e.g., via differential privacy, for trace variants that are used by the main process mining techniques, e.g., process discovery. However, privacy preservation techniques designed for the release of trace variants are still insufficient to meet all the demands of industry-scale utilization. Moreover, ensuring privacy guarantees in situations characterized by a high occurrence of infrequent trace variants remains a challenging endeavor. In this paper, we introduce two novel approaches for releasing differentially private trace variants based on trained generative models. With TraVaG, we leverage \textit{Generative Adversarial Networks} (GANs) to sample from a privatized implicit variant distribution. Our second method employs \textit{Denoising Diffusion Probabilistic Models} that reconstruct artificial trace variants from noise via trained Markov chains. Both methods offer industry-scale benefits and elevate the degree of privacy assurances, particularly in scenarios featuring a substantial prevalence of infrequent variants. Also, they overcome the shortcomings of conventional privacy preservation techniques, such as bounding the length of variants and introducing fake variants. Experimental results on real-life event data demonstrate that our approaches surpass state-of-the-art techniques in terms of privacy guarantees and utility preservation.

SEMay 6, 2024
Process Variant Analysis Across Continuous Features: A Novel Framework

Ali Norouzifar, Majid Rafiei, Marcus Dees et al.

Extracted event data from information systems often contain a variety of process executions making the data complex and difficult to comprehend. Unlike current research which only identifies the variability over time, we focus on other dimensions that may play a role in the performance of the process. This research addresses the challenge of effectively segmenting cases within operational processes based on continuous features, such as duration of cases, and evaluated risk score of cases, which are often overlooked in traditional process analysis. We present a novel approach employing a sliding window technique combined with the earth mover's distance to detect changes in control flow behavior over continuous dimensions. This approach enables case segmentation, hierarchical merging of similar segments, and pairwise comparison of them, providing a comprehensive perspective on process behavior. We validate our methodology through a real-life case study in collaboration with UWV, the Dutch employee insurance agency, demonstrating its practical applicability. This research contributes to the field by aiding organizations in improving process efficiency, pinpointing abnormal behaviors, and providing valuable inputs for process comparison, and outcome prediction.

SEOct 6, 2021
Trustworthy Artificial Intelligence and Process Mining: Challenges and Opportunities

Andrew Pery, Majid Rafiei, Michael Simon et al.

The premise of this paper is that compliance with Trustworthy AI governance best practices and regulatory frameworks is an inherently fragmented process spanning across diverse organizational units, external stakeholders, and systems of record, resulting in process uncertainties and in compliance gaps that may expose organizations to reputational and regulatory risks. Moreover, there are complexities associated with meeting the specific dimensions of Trustworthy AI best practices such as data governance, conformance testing, quality assurance of AI model behaviors, transparency, accountability, and confidentiality requirements. These processes involve multiple steps, hand-offs, re-works, and human-in-the-loop oversight. In this paper, we demonstrate that process mining can provide a useful framework for gaining fact-based visibility to AI compliance process execution, surfacing compliance bottlenecks, and providing for an automated approach to analyze, remediate and monitor uncertainty in AI regulatory compliance processes.

CRJul 30, 2021
PC4PM: A Tool for Privacy/Confidentiality Preservation in Process Mining

Majid Rafiei, Alexander Schnitzler, Wil M. P. van der Aalst

Process mining enables business owners to discover and analyze their actual processes using event data that are widely available in information systems. Event data contain detailed information which is incredibly valuable for providing insights. However, such detailed data often include highly confidential and private information. Thus, concerns of privacy and confidentiality in process mining are becoming increasingly relevant and new techniques are being introduced. To make the techniques easily accessible, new tools need to be developed to integrate the introduced techniques and direct users to appropriate solutions based on their needs. In this paper, we present a Python-based infrastructure implementing and integrating state-of-the-art privacy/confidentiality preservation techniques in process mining. Our tool provides an easy-to-use web-based user interface for privacy-preserving data publishing, risk analysis, and data utility analysis. The tool also provides a set of anonymization operations that can be utilized to support privacy/confidentiality preservation. The tool manages both standard XES event logs and non-standard event data. We also store and manage privacy metadata to track the changes made by privacy/confidentiality preservation techniques.

CRJun 1, 2021
Privacy and Confidentiality in Process Mining -- Threats and Research Challenges

Gamal Elkoumy, Stephan A. Fahrenkrog-Petersen, Mohammadreza Fani Sani et al.

Privacy and confidentiality are very important prerequisites for applying process mining in order to comply with regulations and keep company secrets. This paper provides a foundation for future research on privacy-preserving and confidential process mining techniques. Main threats are identified and related to an motivation application scenario in a hospital context as well as to the current body of work on privacy and confidentiality in process mining. A newly developed conceptual model structures the discussion that existing techniques leave room for improvement. This results in a number of important research challenges that should be addressed by future process mining research.

CRMay 25, 2021
Privacy-Preserving Continuous Event Data Publishing

Majid Rafiei, Wil M. P. van der Aalst

Process mining enables organizations to discover and analyze their actual processes using event data. Event data can be extracted from any information system supporting operational processes, e.g., SAP. Whereas the data inside such systems is protected using access control mechanisms, the extracted event data contain sensitive information that needs to be protected. This creates a new risk and a possible inhibitor for applying process mining. Therefore, privacy issues in process mining become increasingly important. Several privacy preservation techniques have been introduced to mitigate possible attacks against static event data published only once. However, to keep the process mining results up-to-date, event data need to be published continuously. For example, a new log is created at the end of each week. In this paper, we elaborate on the attacks which can be launched against continuously publishing anonymized event data by comparing different releases, so-called correspondence attacks. Particularly, we focus on group-based privacy preservation techniques and show that provided privacy requirements can be degraded exploiting correspondence attacks. We apply the continuous event data publishing scenario to existing real-life event logs and report the anonymity indicators before and after launching the attacks.

DBMay 25, 2021
Group-Based Privacy Preservation Techniques for Process Mining

Majid Rafiei, Wil M. P. van der Aalst

Process mining techniques help to improve processes using event data. Such data are widely available in information systems. However, they often contain highly sensitive information. For example, healthcare information systems record event data that can be utilized by process mining techniques to improve the treatment process, reduce patient's waiting times, improve resource productivity, etc. However, the recorded event data include highly sensitive information related to treatment activities. Responsible process mining should provide insights about the underlying processes, yet, at the same time, it should not reveal sensitive information. In this paper, we discuss the challenges regarding directly applying existing well-known group-based privacy preservation techniques, e.g., k-anonymity, l-diversity, etc, to event data. We provide formal definitions of attack models and introduce an effective group-based privacy preservation technique for process mining. Our technique covers the main perspectives of process mining including control-flow, time, case, and organizational perspectives. The proposed technique provides interpretable and adjustable parameters to handle different privacy aspects. We employ real-life event data and evaluate both data utility and result utility to show the effectiveness of the privacy preservation technique. We also compare this approach with other group-based approaches for privacy-preserving event data publishing.

CRJan 4, 2021
Privacy-Preserving Data Publishing in Process Mining

Majid Rafiei, Wil M. P. van der Aalst

Process mining aims to provide insights into the actual processes based on event data. These data are often recorded by information systems and are widely available. However, they often contain sensitive private information that should be analyzed responsibly. Therefore, privacy issues in process mining are recently receiving more attention. Privacy preservation techniques obviously need to modify the original data, yet, at the same time, they are supposed to preserve the data utility. Privacy-preserving transformations of the data may lead to incorrect or misleading analysis results. Hence, new infrastructures need to be designed for publishing the privacy-aware event data whose aim is to provide metadata regarding the privacy-related transformations on event data without revealing details of privacy preservation techniques or the protected information. In this paper, we provide formal definitions for the main anonymization operations, used by privacy models in process mining. These are used to create an infrastructure for recording the privacy metadata. We advocate the proposed privacy metadata in practice by designing a privacy extension for the XES standard and a general data structure for event data which are not in the form of standard event logs.

DBDec 21, 2020
Towards Quantifying Privacy in Process Mining

Majid Rafiei, Wil M. P. van der Aalst

Process mining employs event logs to provide insights into the actual processes. Event logs are recorded by information systems and contain valuable information helping organizations to improve their processes. However, these data also include highly sensitive private information which is a major concern when applying process mining. Therefore, privacy preservation in process mining is growing in importance, and new techniques are being introduced. The effectiveness of the proposed privacy preservation techniques needs to be evaluated. It is important to measure both sensitive data protection and data utility preservation. In this paper, we propose an approach to quantify the effectiveness of privacy preservation techniques. We introduce two measures for quantifying disclosure risks to evaluate the sensitive data protection aspect. Moreover, a measure is proposed to quantify data utility preservation for the main process mining activities. The proposed measures have been tested using various real-life event logs.

CRSep 24, 2020
Practical Aspect of Privacy-Preserving Data Publishing in Process Mining

Majid Rafiei, Wil M. P. van der Aalst

Process mining techniques such as process discovery and conformance checking provide insights into actual processes by analyzing event data that are widely available in information systems. These data are very valuable, but often contain sensitive information, and process analysts need to balance confidentiality and utility. Privacy issues in process mining are recently receiving more attention from researchers which should be complemented by a tool to integrate the solutions and make them available in the real world. In this paper, we introduce a Python-based infrastructure implementing state-of-the-art privacy preservation techniques in process mining. The infrastructure provides a hierarchy of usages from single techniques to the collection of techniques, integrated as web-based tools. Our infrastructure manages both standard and non-standard event data resulting from privacy preservation techniques. It also stores explicit privacy metadata to track the modifications applied to protect sensitive data.