LiteVPNet: A Lightweight Network for Video Encoding Control in Quality-Critical Applications
This addresses the need for accurate and efficient video encoding in quality-critical applications like On-set Virtual Production, though it is incremental as it builds on existing encoder technology with a novel prediction method.
The paper tackles the problem of precise quality control and energy efficiency in video encoding for cinema production by proposing LiteVPNet, a lightweight neural network that predicts Quantisation Parameters for NVENC AV1 encoders to achieve specified VMAF scores, resulting in mean VMAF errors below 1.2 points and over 87% of test cases within 2 points compared to 61% with state-of-the-art methods.
In the last decade, video workflows in the cinema production ecosystem have presented new use cases for video streaming technology. These new workflows, e.g. in On-set Virtual Production, present the challenge of requiring precise quality control and energy efficiency. Existing approaches to transcoding often fall short of these requirements, either due to a lack of quality control or computational overhead. To fill this gap, we present a lightweight neural network (LiteVPNet) for accurately predicting Quantisation Parameters for NVENC AV1 encoders that achieve a specified VMAF score. We use low-complexity features, including bitstream characteristics, video complexity measures, and CLIP-based semantic embeddings. Our results demonstrate that LiteVPNet achieves mean VMAF errors below 1.2 points across a wide range of quality targets. Notably, LiteVPNet achieves VMAF errors within 2 points for over 87% of our test corpus, c.f. approx 61% with state-of-the-art methods. LiteVPNet's performance across various quality regions highlights its applicability for enhancing high-value content transport and streaming for more energy-efficient, high-quality media experiences.