CVMay 11

ChronoSC: Task-Oriented Semantic Communication via Temporal-to-Color Encoding

Phuc H. Nguyen, Trung T. Nguyen, Quy N. Duong, Van-Dinh Nguyen

arXiv:2605.1638815.0

AI Analysis

This work addresses the problem of efficient video transmission for downstream vision-language tasks in low-resource settings, offering a lightweight alternative to complex spatiotemporal pipelines.

ChronoSC proposes a task-oriented semantic communication framework for VideoQA that encodes temporal video dynamics into a single static image via Chrono-Color Stacking, achieving up to 192x bandwidth reduction while maintaining high accuracy on CLEVRER.

Semantic communication (SC) aims to reduce transmission overhead by conveying task-relevant information rather than raw data. However, existing SC approaches for video largely focus on pixel-level reconstruction or rely on complex spatiotemporal pipelines, leading to excessive bandwidth usage and latency that are unsuitable for low-resource deployments. In this paper, we propose ChronoSC, a task-oriented semantic communication framework for Video Question Answering (VideoQA). ChronoSC introduces Chrono-Color Stacking, a lightweight and lossless projection scheme that encodes temporal video dynamics into a single static image, enabling extreme temporal compression before transmission. This compact semantic representation is transmitted using a lightweight Deep Joint Source-Channel Coding (DeepJSCC) transceiver and explicitly reconstructed at the receiver. Unlike latent-space methods, explicit visual reconstruction enables the direct reuse of pre-trained vision-language models; specifically, a pre-trained BLIP model is employed to infer answers from noisy, reconstructed chrono-images. Experiments on the CLEVRER dataset show that ChronoSC achieves up to 192 times bandwidth reduction compared to raw video transmission while maintaining high VideoQA accuracy.

View on arXiv PDF

Similar