AIMar 16, 2023

Recommending the optimal policy by learning to act from temporal data

Stefano Branchi, Andrei Buliga, Chiara Di Francescomarino, Chiara Ghidini, Francesca Meneghello, Massimiliano Ronzani

arXiv:2303.09209v13.914 citationsh-index: 31

Originality Incremental advance

AI Analysis

This addresses the challenge of optimizing Key Performance Indicators in Process Mining for scenarios lacking explicit models, though it is incremental as it builds on existing RL techniques.

The paper tackles the problem of Prescriptive Process Monitoring by learning an optimal policy from temporal execution data using Reinforcement Learning, achieving results that compare with or surpass Deep RL approaches on real and synthetic datasets.

Prescriptive Process Monitoring is a prominent problem in Process Mining, which consists in identifying a set of actions to be recommended with the goal of optimising a target measure of interest or Key Performance Indicator (KPI). One challenge that makes this problem difficult is the need to provide Prescriptive Process Monitoring techniques only based on temporally annotated (process) execution data, stored in, so-called execution logs, due to the lack of well crafted and human validated explicit models. In this paper we aim at proposing an AI based approach that learns, by means of Reinforcement Learning (RL), an optimal policy (almost) only from the observation of past executions and recommends the best activities to carry on for optimizing a KPI of interest. This is achieved first by learning a Markov Decision Process for the specific KPIs from data, and then by using RL training to learn the optimal policy. The approach is validated on real and synthetic datasets and compared with off-policy Deep RL approaches. The ability of our approach to compare with, and often overcome, Deep RL approaches provides a contribution towards the exploitation of white box RL techniques in scenarios where only temporal execution data are available.

View on arXiv PDF

Similar