ITLGNov 6, 2024

Large Generative Model-assisted Talking-face Semantic Communication System

arXiv:2411.03876v18 citationsh-index: 18IEEE J Sel Area Commun
Originality Incremental advance
AI Analysis

This work addresses bandwidth and quality issues in video communication for users, but it appears incremental as it builds on existing models like FunASR and BERT-VITS2.

The study tackled challenges in talking-face semantic communication, such as low bandwidth and semantic ambiguity, by introducing a system that uses generative models to convert videos to text and back, achieving feasibility and effectiveness in simulations.

The rapid development of generative Artificial Intelligence (AI) continually unveils the potential of Semantic Communication (SemCom). However, current talking-face SemCom systems still encounter challenges such as low bandwidth utilization, semantic ambiguity, and diminished Quality of Experience (QoE). This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) System tailored for the talking-face video communication. Firstly, we introduce a Generative Semantic Extractor (GSE) at the transmitter based on the FunASR model to convert semantically sparse talking-face videos into texts with high information density. Secondly, we establish a private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction, complemented by a joint knowledge base-semantic-channel coding scheme. Finally, at the receiver, we propose a Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video matching the user's timbre. Simulation results demonstrate the feasibility and effectiveness of the proposed LGM-TSC system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes