LG AIApr 15, 2025

A Rollout-Based Algorithm and Reward Function for Resource Allocation in Business Processes

Jeroen Middelhuis, Zaharah Bukhsh, Ivo Adan, Remco Dijkman

arXiv:2504.11250v24.1h-index: 12BPM

Originality Incremental advance

AI Analysis

This addresses inefficiencies in dynamic business process environments for industries relying on optimized resource management, though it is incremental as it builds on existing DRL methods.

The paper tackles the problem of resource allocation in business processes by proposing a rollout-based Deep Reinforcement Learning algorithm and a reward function that directly optimizes cycle time minimization, showing it can learn optimal policies in test scenarios and outperform or match best heuristics on realistic models.

Resource allocation plays a critical role in minimizing cycle time and improving the efficiency of business processes. Recently, Deep Reinforcement Learning (DRL) has emerged as a powerful technique to optimize resource allocation policies in business processes. In the DRL framework, an agent learns a policy through interaction with the environment, guided solely by reward signals that indicate the quality of its decisions. However, existing algorithms are not suitable for dynamic environments such as business processes. Furthermore, existing DRL-based methods rely on engineered reward functions that approximate the desired objective, but a misalignment between reward and objective can lead to undesired decisions or suboptimal policies. To address these issues, we propose a rollout-based DRL algorithm and a reward function to optimize the objective directly. Our algorithm iteratively improves the policy by evaluating execution trajectories following different actions. Our reward function directly decomposes the objective function of minimizing the cycle time, such that trial-and-error reward engineering becomes unnecessary. We evaluated our method in six scenarios, for which the optimal policy can be computed, and on a set of increasingly complex, realistically sized process models. The results show that our algorithm can learn the optimal policy for the scenarios and outperform or match the best heuristics on the realistically sized business processes.

View on arXiv PDF

Similar