AIJul 24, 2025

Optimising Call Centre Operations using Reinforcement Learning: Value Iteration versus Proximal Policy Optimisation

arXiv:2507.18398v1h-index: 2
Originality Synthesis-oriented
AI Analysis

This addresses operational efficiency for call centers, but it is incremental as it applies existing RL methods to a specific domain.

This paper tackled optimizing call routing in call centers to reduce client waiting time and staff idle time by comparing Value Iteration and Proximal Policy Optimization reinforcement learning methods, with PPO achieving the highest rewards and lowest times after 1,000 test episodes.

This paper investigates the application of Reinforcement Learning (RL) to optimise call routing in call centres to minimise client waiting time and staff idle time. Two methods are compared: a model-based approach using Value Iteration (VI) under known system dynamics, and a model-free approach using Proximal Policy Optimisation (PPO) that learns from experience. For the model-based approach, a theoretical model is used, while a simulation model combining Discrete Event Simulation (DES) with the OpenAI Gym environment is developed for model-free learning. Both models frame the problem as a Markov Decision Process (MDP) within a Skills-Based Routing (SBR) framework, with Poisson client arrivals and exponentially distributed service and abandonment times. For policy evaluation, random, VI, and PPO policies are evaluated using the simulation model. After 1,000 test episodes, PPO consistently achives the highest rewards, along with the lowest client waiting time and staff idle time, despite requiring longer training time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes