LG SY OCSep 5, 2024

Differentiable Discrete Event Simulation for Queuing Network Control

arXiv:2409.03740v19.24 citationsh-index: 20

Originality Highly original

AI Analysis

This work addresses congestion management in service systems, communication networks, and manufacturing processes, offering a scalable and flexible solution for realistic scenarios.

The paper tackled the challenge of controlling queuing networks in job-processing systems by proposing a differentiable discrete event simulation framework for policy optimization, achieving 50-1000x improvement in sample efficiency over state-of-the-art RL methods.

Queuing network control is essential for managing congestion in job-processing systems such as service systems, communication networks, and manufacturing processes. Despite growing interest in applying reinforcement learning (RL) techniques, queueing network control poses distinct challenges, including high stochasticity, large state and action spaces, and lack of stability. To tackle these challenges, we propose a scalable framework for policy optimization based on differentiable discrete event simulation. Our main insight is that by implementing a well-designed smoothing technique for discrete event dynamics, we can compute pathwise policy gradients for large-scale queueing networks using auto-differentiation software (e.g., Tensorflow, PyTorch) and GPU parallelization. Through extensive empirical experiments, we observe that our policy gradient estimators are several orders of magnitude more accurate than typical REINFORCE-based estimators. In addition, We propose a new policy architecture, which drastically improves stability while maintaining the flexibility of neural-network policies. In a wide variety of scheduling and admission control tasks, we demonstrate that training control policies with pathwise gradients leads to a 50-1000x improvement in sample efficiency over state-of-the-art RL methods. Unlike prior tailored approaches to queueing, our methods can flexibly handle realistic scenarios, including systems operating in non-stationary environments and those with non-exponential interarrival/service times.

View on arXiv PDF

Similar