IT LGNov 6, 2024

Large Generative Model-assisted Talking-face Semantic Communication System

Feibo Jiang, Siwei Tu, Li Dong, Cunhua Pan, Jiangzhou Wang, Xiaohu You

arXiv:2411.03876v15.18 citationsh-index: 71IEEE J Sel Area Commun

Originality Incremental advance

AI Analysis

This work addresses bandwidth and quality issues in video communication for users, but it appears incremental as it builds on existing models like FunASR and BERT-VITS2.

The study tackled challenges in talking-face semantic communication, such as low bandwidth and semantic ambiguity, by introducing a system that uses generative models to convert videos to text and back, achieving feasibility and effectiveness in simulations.

The rapid development of generative Artificial Intelligence (AI) continually unveils the potential of Semantic Communication (SemCom). However, current talking-face SemCom systems still encounter challenges such as low bandwidth utilization, semantic ambiguity, and diminished Quality of Experience (QoE). This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) System tailored for the talking-face video communication. Firstly, we introduce a Generative Semantic Extractor (GSE) at the transmitter based on the FunASR model to convert semantically sparse talking-face videos into texts with high information density. Secondly, we establish a private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction, complemented by a joint knowledge base-semantic-channel coding scheme. Finally, at the receiver, we propose a Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video matching the user's timbre. Simulation results demonstrate the feasibility and effectiveness of the proposed LGM-TSC system.

View on arXiv PDF

Similar