Perception-Aware Video Semantic Communication
For real-time wireless video transmission, PVSC offers a practical solution that significantly reduces bandwidth while maintaining perceptual quality, addressing the limitations of conventional and learned codecs.
PVSC is a perception-aware video semantic communication framework that eliminates explicit motion-vector transmission and uses spatio-temporal feature coding for compact, channel-robust symbol streams. It achieves up to 75% and 87% bandwidth savings over VTM+5G LDPC at comparable LPIPS and DISTS, respectively, while enabling real-time inference on a single RTX 4090 GPU.
Ultra-high-resolution streaming and emerging immersive services are driving rapidly increasing wireless video traffic. However, perceptually pleasing video transmission over bandwidth-limited and latency-constrained wireless links remains challenging for conventional separated source-channel systems, which primarily target bit-level reliability and often suffer performance degradation under short-blocklength transmission. In addition, pixel-level distortion optimization does not necessarily align with human perception, while existing learned video codecs may incur high complexity and raise deployment issues. This paper proposes PVSC, a perception-aware video semantic communication framework for real-time wireless video transmission. PVSC eliminates explicit motion-vector transmission and exploits spatio-temporal feature coding to generate compact and channel-robust symbol streams. It also specifies side-information formatting, reference-buffer management, and lightweight rate control, enabling stable receiver-side reconstruction and bandwidth-adaptive inference with a single model. Extensive experiments demonstrate that PVSC achieves superior performance across diverse datasets, resolutions, GOP configurations, and channel conditions. Compared with the engineered ``VTM + 5G LDPC'' baseline, PVSC saves up to about 75% and 87% bandwidth at comparable LPIPS and DISTS, respectively, while enabling real-time inference on a single NVIDIA RTX 4090 GPU.