LGSPFeb 8, 2024

Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization

arXiv:2402.05476v18 citationsh-index: 5IEEE Transactions on Signal Processing
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in reinforcement learning for network control, offering incremental improvements in policy optimization for large-scale Markov decision processes.

The paper tackles the performance and complexity challenges of Q-learning in large networks by proposing a model-free ensemble RL algorithm that runs multiple Q-learning instances on synthetic Markovian environments and fuses outputs with an adaptive weighting mechanism. It achieves up to 55% less average policy error and up to 50% less runtime complexity compared to state-of-the-art Q-learning algorithms.

Reinforcement learning (RL) is a classical tool to solve network control or policy optimization problems in unknown environments. The original Q-learning suffers from performance and complexity challenges across very large networks. Herein, a novel model-free ensemble reinforcement learning algorithm which adapts the classical Q-learning is proposed to handle these challenges for networks which admit Markov decision process (MDP) models. Multiple Q-learning algorithms are run on multiple, distinct, synthetically created and structurally related Markovian environments in parallel; the outputs are fused using an adaptive weighting mechanism based on the Jensen-Shannon divergence (JSD) to obtain an approximately optimal policy with low complexity. The theoretical justification of the algorithm, including the convergence of key statistics and Q-functions are provided. Numerical results across several network models show that the proposed algorithm can achieve up to 55% less average policy error with up to 50% less runtime complexity than the state-of-the-art Q-learning algorithms. Numerical results validate assumptions made in the theoretical analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes