Dilated convolutional neural network-based deep reference picture generation for video compression
This work addresses video compression efficiency for encoding systems, but it is incremental as it builds on existing CNN and VVC frameworks.
The paper tackled the problem of inaccurate fractional-pixel motion compensation in video coding by proposing a dilated CNN-based deep reference picture generator, which achieved an average 9.7% bit saving compared to VVC under low-delay P configuration.
Motion estimation and motion compensation are indispensable parts of inter prediction in video coding. Since the motion vector of objects is mostly in fractional pixel units, original reference pictures may not accurately provide a suitable reference for motion compensation. In this paper, we propose a deep reference picture generator which can create a picture that is more relevant to the current encoding frame, thereby further reducing temporal redundancy and improving video compression efficiency. Inspired by the recent progress of Convolutional Neural Network(CNN), this paper proposes to use a dilated CNN to build the generator. Moreover, we insert the generated deep picture into Versatile Video Coding(VVC) as a reference picture and perform a comprehensive set of experiments to evaluate the effectiveness of our network on the latest VVC Test Model VTM. The experimental results demonstrate that our proposed method achieves on average 9.7% bit saving compared with VVC under low-delay P configuration.