Vibhoothi Vibhoothi

IV
h-index1
5papers
1citation
Novelty38%
AI Score42

5 Papers

IVMay 15
Video Quality Evaluation Methodology and Result of AV2 Compression Performance

Zhijun Lei, Vibhoothi Vibhoothi, Dzung Hoang et al.

The Alliance for Open Media (AOMedia) has developed the AV2 video coding standard to supersede AV1, aiming for substantial compression efficiency gains across diverse media applications. This paper details the quality and performance evaluation methodology defined in the AV2 Common Test Conditions (CTC), which introduces new evaluation methods and content, including convex-hull-based adaptive streaming (AS) configuration, user-generated content (UGC), and extended chroma formats. We present the coding gains of the AV2 (v13.0) against the AV1 baseline. Experimental results show that AV2 achieves significant Bjøntegaard-Delta Rate (BD-rate) reductions of 29.81\% and 33.79\% for PSNR-YUV and VMAF, respectively, under random access configuration, validating the efficiency of AV2 for next-generation streaming applications.

IVMay 14
Efficient Dense Matching for Enhanced Gaussian Splatting Using AV1 Motion Vectors

Julien Zouein, Vibhoothi Vibhoothi, François Pitié et al.

3D Gaussian Splatting (3DGS) has emerged as a prominent framework for real-time, photorealistic scene reconstruction, offering significant speed-ups over Neural Radiance Fields (NeRF). However, the fidelity of 3DGS representations remains heavily dependent on the quality of the initial point cloud. While standard Structure-from-Motion (SfM) pipelines using COLMAP provide adequate initialisation, they often suffer from high computational costs and sparsity in textureless regions, which degrades subsequent reconstruction accuracy and convergence speed. In this work, we introduce an AV1-based feature detection and matching pipeline that significantly reduces SfM processing overhead. By leveraging motion vectors inherent to the AV1 video codec, we bypass computationally expensive exhaustive matching while maintaining geometric robustness. Our pipeline produces substantially denser point clouds, with up to eight times as many points as classical SfM. We demonstrate that this enhanced initialisation directly improves 3DGS performance, yielding an 9-point increase in VMAF and a 63% average reduction in training time required to reach baseline quality. The project page: https://sigmedia.tv/AV1-3DGS.github.io/

IVOct 20, 2025
AV1 Motion Vector Fidelity and Application for Efficient Optical Flow

Julien Zouein, Vibhoothi Vibhoothi, Anil Kokaram

This paper presents a comprehensive analysis of motion vectors extracted from AV1-encoded video streams and their application in accelerating optical flow estimation. We demonstrate that motion vectors from AV1 video codec can serve as a high-quality and computationally efficient substitute for traditional optical flow, a critical but often resource-intensive component in many computer vision pipelines. Our primary contributions are twofold. First, we provide a detailed comparison of motion vectors from both AV1 and HEVC against ground-truth optical flow, establishing their fidelity. In particular we show the impact of encoder settings on motion estimation fidelity and make recommendations about the optimal settings. Second, we show that using these extracted AV1 motion vectors as a "warm-start" for a state-of-the-art deep learning-based optical flow method, RAFT, significantly reduces the time to convergence while achieving comparable accuracy. Specifically, we observe a four-fold speedup in computation time with only a minor trade- off in end-point error. These findings underscore the potential of reusing motion vectors from compressed video as a practical and efficient method for a wide range of motion-aware computer vision applications.

IVOct 14, 2025
LiteVPNet: A Lightweight Network for Video Encoding Control in Quality-Critical Applications

Vibhoothi Vibhoothi, François Pitié, Anil Kokaram

In the last decade, video workflows in the cinema production ecosystem have presented new use cases for video streaming technology. These new workflows, e.g. in On-set Virtual Production, present the challenge of requiring precise quality control and energy efficiency. Existing approaches to transcoding often fall short of these requirements, either due to a lack of quality control or computational overhead. To fill this gap, we present a lightweight neural network (LiteVPNet) for accurately predicting Quantisation Parameters for NVENC AV1 encoders that achieve a specified VMAF score. We use low-complexity features, including bitstream characteristics, video complexity measures, and CLIP-based semantic embeddings. Our results demonstrate that LiteVPNet achieves mean VMAF errors below 1.2 points across a wide range of quality targets. Notably, LiteVPNet achieves VMAF errors within 2 points for over 87% of our test corpus, c.f. approx 61% with state-of-the-art methods. LiteVPNet's performance across various quality regions highlights its applicability for enhancing high-value content transport and streaming for more energy-efficient, high-quality media experiences.

IVOct 23, 2024
Predicting total time to compress a video corpus using online inference systems

Xin Shu, Vibhoothi Vibhoothi, Anil Kokaram

Predicting the computational cost of compressing/transcoding clips in a video corpus is important for resource management of cloud services and VOD (Video On Demand) providers. Currently, customers of cloud video services are unaware of the cost of transcoding their files until the task is completed. Previous work concentrated on predicting perclip compression time, and thus estimating the cost of video compression. In this work, we propose new Machine Learning (ML) systems which predict cost for the entire corpus instead. This is a more appropriate goal since users are not interested in per-clip cost but instead the cost for the whole corpus. In this work, we evaluate our systems with respect to two video codecs (x264, x265) and a novel high-quality video corpus. We find that the accuracy of aggregate time prediction for a video corpus more than two times better than using per-clip predictions. Furthermore, we present an online inference framework in which we update the ML models as files are processed. A consideration of video compute overhead and appropriate choice of ML predictor for each fraction of corpus completed yields a prediction error of less than 5%. This is approximately two times better than previous work which proposed generalised predictors.