Interpreting CNN for Low Complexity Learned Sub-pixel Motion Compensation in Video Coding
This work addresses the complexity barrier for practical deployment of neural networks in video coding, offering an incremental improvement over existing methods.
The paper tackled the high complexity of deep learning in video compression by interpreting learned interpolation filters to reduce computational cost, achieving up to 4.5% BD-rate savings in VVC compared to baseline.
Deep learning has shown great potential in image and video compression tasks. However, it brings bit savings at the cost of significant increases in coding complexity, which limits its potential for implementation within practical applications. In this paper, a novel neural network-based tool is presented which improves the interpolation of reference samples needed for fractional precision motion compensation. Contrary to previous efforts, the proposed approach focuses on complexity reduction achieved by interpreting the interpolation filters learned by the networks. When the approach is implemented in the Versatile Video Coding (VVC) test model, up to 4.5% BD-rate saving for individual sequences is achieved compared with the baseline VVC, while the complexity of learned interpolation is significantly reduced compared to the application of full neural network.