Latent Diffusion Model-Enabled Low-Latency Semantic Communication in the Presence of Semantic Ambiguities and Wireless Channel Noises
This work addresses efficiency and robustness issues in semantic communication networks, which is critical for real-time applications, though it appears incremental by building on existing diffusion model techniques.
The paper tackles the challenges of semantic communication systems, including sensitivity to wireless channel noise and poor generalization, by developing a latent diffusion model-enabled system that achieves robustness to outliers, handles out-of-distribution data, and performs low-latency denoising, outperforming existing methods in metrics like MS-SSIM and LPIPS.
Deep learning (DL)-based Semantic Communications (SemCom) is becoming critical to maximize overall efficiency of communication networks. Nevertheless, SemCom is sensitive to wireless channel uncertainties, source outliers, and suffer from poor generalization bottlenecks. To address the mentioned challenges, this paper develops a latent diffusion model-enabled SemCom system with three key contributions, i.e., i) to handle potential outliers in the source data, semantic errors obtained by projected gradient descent based on the vulnerabilities of DL models, are utilized to update the parameters and obtain an outlier-robust encoder, ii) a lightweight single-layer latent space transformation adapter completes one-shot learning at the transmitter and is placed before the decoder at the receiver, enabling adaptation for out-of-distribution data and enhancing human-perceptual quality, and iii) an end-to-end consistency distillation (EECD) strategy is used to distill the diffusion models trained in latent space, enabling deterministic single or few-step low-latency denoising in various noisy channels while maintaining high semantic quality. Extensive numerical experiments across different datasets demonstrate the superiority of the proposed SemCom system, consistently proving its robustness to outliers, the capability to transmit data with unknown distributions, and the ability to perform real-time channel denoising tasks while preserving high human perceptual quality, outperforming the existing denoising approaches in semantic metrics such as multi-scale structural similarity index measure (MS-SSIM) and learned perceptual image path similarity (LPIPS).