Saiping Zhang

CV
6papers
33citations
Novelty50%
AI Score23

6 Papers

IVMay 13, 2022
Slimmable Video Codec

Zhaocheng Liu, Luis Herranz, Fei Yang et al.

Neural video compression has emerged as a novel paradigm combining trainable multilayer neural networks and machine learning, achieving competitive rate-distortion (RD) performances, but still remaining impractical due to heavy neural architectures, with large memory and computational demands. In addition, models are usually optimized for a single RD tradeoff. Recent slimmable image codecs can dynamically adjust their model capacity to gracefully reduce the memory and computation requirements, without harming RD performance. In this paper we propose a slimmable video codec (SlimVC), by integrating a slimmable temporal entropy model in a slimmable autoencoder. Despite a significantly more complex architecture, we show that slimming remains a powerful mechanism to control rate, memory footprint, computational cost and latency, all being important requirements for practical video compression.

CVJun 16, 2022
PeQuENet: Perceptual Quality Enhancement of Compressed Video with Adaptation- and Attention-based Network

Saiping Zhang, Luis Herranz, Marta Mrak et al.

In this paper we propose a generative adversarial network (GAN) framework to enhance the perceptual quality of compressed videos. Our framework includes attention and adaptation to different quantization parameters (QPs) in a single model. The attention module exploits global receptive fields that can capture and align long-range correlations between consecutive frames, which can be beneficial for enhancing perceptual quality of videos. The frame to be enhanced is fed into the deep network together with its neighboring frames, and in the first stage features at different depths are extracted. Then extracted features are fed into attention blocks to explore global temporal correlations, followed by a series of upsampling and convolution layers. Finally, the resulting features are processed by the QP-conditional adaptation module which leverages the corresponding QP information. In this way, a single model can be used to enhance adaptively to various QPs without requiring multiple models specific for every QP value, while having similar performance. Experimental results demonstrate the superior performance of the proposed PeQuENet compared with the state-of-the-art compressed video quality enhancement algorithms.

CVJan 29, 2022
Light field Rectification based on relative pose estimation

Xiao Huo, Dongyang Jin, Saiping Zhang et al.

Hand-held light field (LF) cameras have unique advantages in computer vision such as 3D scene reconstruction and depth estimation. However, the related applications are limited by the ultra-small baseline, e.g., leading to the extremely low depth resolution in reconstruction. To solve this problem, we propose to rectify LF to obtain a large baseline. Specifically, the proposed method aligns two LFs captured by two hand-held LF cameras with a random relative pose, and extracts the corresponding row-aligned sub-aperture images (SAIs) to obtain an LF with a large baseline. For an accurate rectification, a method for pose estimation is also proposed, where the relative rotation and translation between the two LF cameras are estimated. The proposed pose estimation minimizes the degree of freedom (DoF) in the LF-point-LF-point correspondence model and explicitly solves this model in a linear way. The proposed pose estimation outperforms the state-of-the-art algorithms by providing more accurate results to support rectification. The significantly improved depth resolution in 3D reconstruction demonstrates the effectiveness of the proposed LF rectification.

IVJan 22, 2022
DCNGAN: A Deformable Convolutional-Based GAN with QP Adaptation for Perceptual Quality Enhancement of Compressed Video

Saiping Zhang, Luis Herranz, Marta Mrak et al.

In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. DCNGAN is also adaptive to the quantization parameters (QPs). Compared with optical flows, deformable convolutions are more effective and efficient to align frames. Deformable convolutions can operate on multiple frames, thus leveraging more temporal information, which is beneficial for enhancing the perceptual quality of compressed videos. Instead of aligning frames in a pairwise manner, the deformable convolution can process multiple frames simultaneously, which leads to lower computational complexity. Experimental results demonstrate that the proposed DCNGAN outperforms other state-of-the-art compressed video quality enhancement algorithms.

IVSep 22, 2021
DVC-P: Deep Video Compression with Perceptual Optimizations

Saiping Zhang, Marta Mrak, Luis Herranz et al.

Recent years have witnessed the significant development of learning-based video compression methods, which aim at optimizing objective or perceptual quality and bit rates. In this paper, we introduce deep video compression with perceptual optimizations (DVC-P), which aims at increasing perceptual quality of decoded videos. Our proposed DVC-P is based on Deep Video Compression (DVC) network, but improves it with perceptual optimizations. Specifically, a discriminator network and a mixed loss are employed to help our network trade off among distortion, perception and rate. Furthermore, nearest-neighbor interpolation is used to eliminate checkerboard artifacts which can appear in sequences encoded with DVC frameworks. Thanks to these two improvements, the perceptual quality of decoded sequences is improved. Experimental results demonstrate that, compared with the baseline DVC, our proposed method can generate videos with higher perceptual quality achieving 12.27% reduction in a perceptual BD-rate equivalent, on average.

CVJan 11, 2020
A Two-step Calibration Method for Unfocused Light Field Camera Based on Projection Model Analysis

Dongyang Jin, Saiping Zhang, Xiao Huo et al.

Accurately calibrating light field camera is essential to its applications. Rapid progress has been made in this area in the past decades. In this paper, detailed analysis was first performed towards the state of the art projection models for calibration which were further interpreted in three representations, including the correspondence between rays and pixels, 3D physical points and pixels and between 3D physical points and 3D signal structure of the captured light field. Based on the analysis, parameters in the projection model were grouped into direction parameter set and depth parameter set. A two-step calibration method was then proposed with each step dealing with each set of parameters. The proposed method is able to reuse traditional camera calibration methods for the direction parameter set. A simply raw image-based calibration of depth parameter set was further proposed. Systematic validations were conducted to evaluate the performance of the proposed calibration method. Experimental results show that the accuracy and robustness of the proposed method outperforms its counterparts under various benchmark criteria.