Learning Selective Merge Policies for Deadline-Constrained Coded Caching via Deep Reinforcement Learning

arXiv:2605.1523619.3

Predicted impact top 67% in IT · last 90 daysOriginality Incremental advance

AI Analysis

For delay-sensitive applications like video streaming, this work provides a DRL-based solution to improve efficiency under strict deadlines, showing that selective merging outperforms aggressive merging.

The paper tackles deadline-constrained coded caching by using deep reinforcement learning to learn selective merge policies. The proposed method reduces broadcast-packet expiration ratio by 40.9% (0.208 vs. 0.352) compared to the best baseline and achieves the best broadcast-efficiency score among coded multi-casting methods.

With the coded caching, the server can use the information the users have cached to serve multiple users at a time by sending a single coded multi-casting message, i.e., the merged message, thereby relieving the peak network loads. However, for the delay-sensitive applications of the users, like the video streaming services, it becomes essential to choose which messages to merge online, considering the strict deadlines for each request. The problem, however, is that while the merge is helpful for the formation of the current coded multi-casting message, it can be harmful for the subsequent ones. We proposed a DRL-based solution that formulates the deadline-constrained coded delivery as a masked discrete-action queue-state control problem, while we trained a graph-attention policy network via proximal policy optimization. The policy network reduces the broadcast-packet expiration ratio $ρ$ by $40.9%$ ($0.208$ vs. $0.352$) with respect to the best coded multi-casting baseline (SACM++) on the uniform-demand benchmark, while also attaining the best broadcast-efficiency score $σ$ across the Track A battery among the coded multi-casting methods. The interesting fact we observed is that for the applications of the users with tight deadlines, the method of selective merging is better than the method of aggressive merging, i.e., the policy network learns to merge at only $\approx 31.8%$ rate, even though the same observation holds across the variations within the same simulator family.

View on arXiv PDF

Similar