MMMay 26, 2020

Self-play Reinforcement Learning for Video Transmission

arXiv:2005.12788v116 citations
Originality Incremental advance
AI Analysis

This addresses the issue of misaligned optimization in video transmission services, which is incremental as it builds on existing adaptive algorithms.

The paper tackles the problem of inaccurate optimization functions in video transmission by proposing Zwei, a self-play reinforcement learning algorithm that directly uses actual requirements to update policies, resulting in outperforming state-of-the-art methods across all considered scenarios.

Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function fails to describe the requirement accurately. Thus, such proposed methods might eventually violate the original needs. To eliminate this concern, we propose \emph{Zwei}, a self-play reinforcement learning algorithm for video transmission tasks. Zwei aims to update the policy by straightforwardly utilizing the actual requirement. Technically, Zwei samples a number of trajectories from the same starting point and instantly estimates the win rate w.r.t the competition outcome. Here the competition result represents which trajectory is closer to the assigned requirement. Subsequently, Zwei optimizes the strategy by maximizing the win rate. To build Zwei, we develop simulation environments, design adequate neural network models, and invent training methods for dealing with different requirements on various video transmission scenarios. Trace-driven analysis over two representative tasks demonstrates that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes