ModeNet: Mode Selection Network For Learned Video Coding
This work addresses video compression efficiency for applications like streaming and storage, but it is incremental as it builds on existing learned video coding methods.
The paper tackles the problem of enhancing deep learning-based video compression by proposing ModeNet, a mode selection network that enables competition among coding modes and assigns each pixel to the most suitable one, achieving compelling performance under CLIC20 P-frame coding conditions.
In this paper, a mode selection network (ModeNet) is proposed to enhance deep learning-based video compression. Inspired by traditional video coding, ModeNet purpose is to enable competition among several coding modes. The proposed ModeNet learns and conveys a pixel-wise partitioning of the frame, used to assign each pixel to the most suited coding mode. ModeNet is trained alongside the different coding modes to minimize a rate-distortion cost. It is a flexible component which can be generalized to other systems to allow competition between different coding tools. Mod-eNet interest is studied on a P-frame coding task, where it is used to design a method for coding a frame given its prediction. ModeNet-based systems achieve compelling performance when evaluated under the Challenge on Learned Image Compression 2020 (CLIC20) P-frame coding track conditions.