Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian Traffic
This addresses routing inefficiencies for network operators by handling non-Markovian traffic and spatial dependencies, though it is incremental as it builds on existing RL methods.
The paper tackled the problem of packet routing in communication networks with non-Markovian traffic and unmodeled spatial structure, resulting in a spatial-temporal RL framework that outperformed traditional baselines by over 19% during training and 7% for inference despite topology changes.
Reinforcement Learning (RL) has been widely used for packet routing in communication networks, but traditional RL methods rely on the Markov assumption that the current state contains all necessary information for decision-making. In reality, internet traffic is non-Markovian, and past states do influence routing performance. Moreover, common deep RL approaches use function approximators, such as neural networks, that do not model the spatial structure in network topologies. To address these shortcomings, we design a network environment with non-Markovian traffic and introduce a spatial-temporal RL (STRL) framework for packet routing. Our approach outperforms traditional baselines by more than 19% during training and 7% for inference despite a change in network topology.