CVSPMay 16, 2024

Language-Oriented Semantic Latent Representation for Image Transmission

arXiv:2405.09976v132 citationsh-index: 26Has CodeMLSP
Originality Incremental advance
AI Analysis

This addresses the issue of perceptual differences in image reconstruction for semantic communication systems, offering an incremental improvement over existing methods.

The paper tackled the problem of coarse text representations in semantic communication for image transmission by proposing a framework that communicates both text and compressed image embeddings, achieving higher perceptual similarity while transmitting only 2.09% of the original image size compared to a baseline.

In the new paradigm of semantic communication (SC), the focus is on delivering meanings behind bits by extracting semantic information from raw data. Recent advances in data-to-text models facilitate language-oriented SC, particularly for text-transformed image communication via image-to-text (I2T) encoding and text-to-image (T2I) decoding. However, although semantically aligned, the text is too coarse to precisely capture sophisticated visual features such as spatial locations, color, and texture, incurring a significant perceptual difference between intended and reconstructed images. To address this limitation, in this paper, we propose a novel language-oriented SC framework that communicates both text and a compressed image embedding and combines them using a latent diffusion model to reconstruct the intended image. Experimental results validate the potential of our approach, which transmits only 2.09\% of the original image size while achieving higher perceptual similarities in noisy communication channels compared to a baseline SC method that communicates only through text.The code is available at https://github.com/ispamm/Img2Img-SC/ .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes