Video compression with low complexity CNN-based spatial resolution adaptation
This addresses the problem of computational efficiency in video compression for applications requiring low-complexity decoding, though it is incremental as it builds on existing spatial resolution adaptation methods.
The paper tackles the high decoder complexity in CNN-based video compression by proposing a framework that shifts complexity to the encoder, using a CNN for down-sampling and a Lanczos3 filter for up-sampling, achieving over 10% bitrate savings and reduced computational complexity (29% at encoder, 10% at decoder) compared to HEVC HM.
It has recently been demonstrated that spatial resolution adaptation can be integrated within video compression to improve overall coding performance by spatially down-sampling before encoding and super-resolving at the decoder. Significant improvements have been reported when convolutional neural networks (CNNs) were used to perform the resolution up-sampling. However, this approach suffers from high complexity at the decoder due to the employment of CNN-based super-resolution. In this paper, a novel framework is proposed which supports the flexible allocation of complexity between the encoder and decoder. This approach employs a CNN model for video down-sampling at the encoder and uses a Lanczos3 filter to reconstruct full resolution at the decoder. The proposed method was integrated into the HEVC HM 16.20 software and evaluated on JVET UHD test sequences using the All Intra configuration. The experimental results demonstrate the potential of the proposed approach, with significant bitrate savings (more than 10%) over the original HEVC HM, coupled with reduced computational complexity at both encoder (29%) and decoder (10%).