MMIVApr 10, 2021

A Versatile Depth Video Encoding Scheme Based on Low-rank Tensor Modeling for Free Viewpoint Video

arXiv:2104.04678v2
AI Analysis

This work addresses a bottleneck in 3D display applications by reducing computational costs while maintaining rendering quality, representing an incremental improvement over existing methods.

The paper tackles the high encoding complexity of depth video compression in free-viewpoint video by proposing a low-complexity scheme based on low-rank tensor modeling and HEVC intra coding, achieving significant rate gains in compressing depth sequences for benchmark datasets like Ballet and Breakdancing.

The compression quality losses of depth sequences determine quality of view synthesis in free-viewpoint video. The depth map intra prediction in 3D extensions of the HEVC applies intra modes with auxiliary depth modeling modes (DMMs) to better preserve depth edges and handle motion discontinuities. Although such modes enable high efficiency compression, but at the cost of very high encoding complexity. Skipping conventional intra coding modes and DMMs in depth coding limits practical applicability of the HEVC for 3D display applications. In this paper, we introduce a novel low-complexity scheme for depth video compression based on low-rank tensor decomposition and HEVC intra coding. The proposed scheme leverages spatial and temporal redundancy by compactly representing the depth sequence as a high-order tensor. Tensor factorization into a set of factor matrices following CANDECOMP PARAFAC (CP) decomposition via alternating least squares give a low-rank approximation of the scene geometry. Further, compression of factor matrices with HEVC intra prediction support arbitrary target accuracy by flexible adjustment of bitrate, varying tensor decomposition ranks and quantization parameters. The results demonstrate proposed approach achieves significant rate gains by efficiently compressing depth planes in low-rank approximated representation. The proposed algorithm is applied to encode depth maps of benchmark Ballet and Breakdancing sequences. The decoded depth sequences are used for view synthesis in a multi-view video system, maintaining appropriate rendering quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes