CVNov 12, 2025

GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow

arXiv:2511.09272v1h-index: 2
Originality Incremental advance
AI Analysis

This work addresses the problem of high power consumption and inflexibility in deploying generative video codecs on edge devices for video services, representing an incremental improvement in hardware optimization.

The paper tackles the challenge of deploying Animation-based Generative Codec (AGC) for talking-face video compression on resource-constrained edge devices by proposing an FPGA-oriented deployment scheme, achieving 24.9× and 4.1× higher energy efficiency compared to CPU and GPU, with 11.7 μJ per pixel.

The Animation-based Generative Codec (AGC) is an emerging paradigm for talking-face video compression. However, deploying its intricate decoder on resource and power-constrained edge devices presents challenges due to numerous parameters, the inflexibility to adapt to dynamically evolving algorithms, and the high power consumption induced by extensive computations and data transmission. This paper for the first time proposes a novel field programmable gate arrays (FPGAs)-oriented AGC deployment scheme for edge-computing video services. Initially, we analyze the AGC algorithm and employ network compression methods including post-training static quantization and layer fusion techniques. Subsequently, we design an overlapped accelerator utilizing the co-processor paradigm to perform computations through software-hardware co-design. The hardware processing unit comprises engines such as convolution, grid sampling, upsample, etc. Parallelization optimization strategies like double-buffered pipelines and loop unrolling are employed to fully exploit the resources of FPGA. Ultimately, we establish an AGC FPGA prototype on the PYNQ-Z1 platform using the proposed scheme, achieving \textbf{24.9$\times$} and \textbf{4.1$\times$} higher energy efficiency against commercial Central Processing Unit (CPU) and Graphic Processing Unit (GPU), respectively. Specifically, only \textbf{11.7} microjoules ($\upmu$J) are required for one pixel reconstructed by this FPGA system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes