IVCVSPFeb 1, 2022

CAESR: Conditional Autoencoder and Super-Resolution for Learned Spatial Scalability

arXiv:2202.00416v1
Originality Incremental advance
AI Analysis

This work addresses the need for efficient scalable video coding, offering an incremental improvement over existing standards like VVC for applications requiring adaptable resolution transmission.

The paper tackles the problem of spatial scalability in video coding by proposing CAESR, a hybrid learning-based approach that combines VVC intra-mode for a base layer with a conditional autoencoder and super-resolution for an enhancement layer, achieving competitive performance with VVC full-resolution intra coding while enabling scalability.

In this paper, we present CAESR, an hybrid learning-based coding approach for spatial scalability based on the versatile video coding (VVC) standard. Our framework considers a low-resolution signal encoded with VVC intra-mode as a base-layer (BL), and a deep conditional autoencoder with hyperprior (AE-HP) as an enhancement-layer (EL) model. The EL encoder takes as inputs both the upscaled BL reconstruction and the original image. Our approach relies on conditional coding that learns the optimal mixture of the source and the upscaled BL image, enabling better performance than residual coding. On the decoder side, a super-resolution (SR) module is used to recover high-resolution details and invert the conditional coding process. Experimental results have shown that our solution is competitive with the VVC full-resolution intra coding while being scalable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes