IVAICVLGMMFeb 27, 2021

Transform Network Architectures for Deep Learning based End-to-End Image/Video Coding in Subsampled Color Spaces

arXiv:2103.01760v219 citations
AI Analysis

This work addresses the gap in DLEC methods for subsampled color spaces, offering a domain-specific improvement for video compression applications.

The paper tackled the problem of adapting deep learning-based end-to-end image/video coding (DLEC) architectures to the YUV 4:2:0 color format, which is standard in compression like HEVC/VVC, and proposed a new transform network architecture that achieved about 10% average BD-rate improvement over HEVC intra-frame coding.

Most of the existing deep learning based end-to-end image/video coding (DLEC) architectures are designed for non-subsampled RGB color format. However, in order to achieve a superior coding performance, many state-of-the-art block-based compression standards such as High Efficiency Video Coding (HEVC/H.265) and Versatile Video Coding (VVC/H.266) are designed primarily for YUV 4:2:0 format, where U and V components are subsampled by considering the human visual system. This paper investigates various DLEC designs to support YUV 4:2:0 format by comparing their performance against the main profiles of HEVC and VVC standards under a common evaluation framework. Moreover, a new transform network architecture is proposed to improve the efficiency of coding YUV 4:2:0 data. The experimental results on YUV 4:2:0 datasets show that the proposed architecture significantly outperforms naive extensions of existing architectures designed for RGB format and achieves about 10% average BD-rate improvement over the intra-frame coding in HEVC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes