IVAICVMMFeb 13

VineetVC: Adaptive Video Conferencing Under Severe Bandwidth Constraints Using Audio-Driven Talking-Head Reconstruction

arXiv:2602.12758v1h-index: 4
Originality Incremental advance
AI Analysis

This addresses bandwidth depletion issues in consumer and constrained networks for video conferencing users, representing an incremental improvement by combining existing technologies with a novel adaptation strategy.

The paper tackles the problem of video conferencing under severe bandwidth constraints by integrating WebRTC with an audio-driven talking-head reconstruction system, achieving a median bandwidth of 32.80 kbps for synthesized video streams.

Intense bandwidth depletion within consumer and constrained networks has the potential to undermine the stability of real-time video conferencing: encoder rate management becomes saturated, packet loss escalates, frame rates deteriorate, and end-to-end latency significantly increases. This work delineates an adaptive conferencing system that integrates WebRTC media delivery with a supplementary audio-driven talking-head reconstruction pathway and telemetry-driven mode regulation. The system consists of a WebSocket signaling service, an optional SFU for multi-party transmission, a browser client capable of real-time WebRTC statistics extraction and CSV telemetry export, and an AI REST service that processes a reference face image and recorded audio to produce a synthesized MP4; the browser can substitute its outbound camera track with the synthesized stream with a median bandwidth of 32.80 kbps. The solution incorporates a bandwidth-mode switching strategy and a client-side mode-state logger.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes