CVMay 11

ChronoSC: Task-Oriented Semantic Communication via Temporal-to-Color Encoding

arXiv:2605.1638815.0
AI Analysis

This work addresses the problem of efficient video transmission for downstream vision-language tasks in low-resource settings, offering a lightweight alternative to complex spatiotemporal pipelines.

ChronoSC proposes a task-oriented semantic communication framework for VideoQA that encodes temporal video dynamics into a single static image via Chrono-Color Stacking, achieving up to 192x bandwidth reduction while maintaining high accuracy on CLEVRER.

Semantic communication (SC) aims to reduce transmission overhead by conveying task-relevant information rather than raw data. However, existing SC approaches for video largely focus on pixel-level reconstruction or rely on complex spatiotemporal pipelines, leading to excessive bandwidth usage and latency that are unsuitable for low-resource deployments. In this paper, we propose ChronoSC, a task-oriented semantic communication framework for Video Question Answering (VideoQA). ChronoSC introduces Chrono-Color Stacking, a lightweight and lossless projection scheme that encodes temporal video dynamics into a single static image, enabling extreme temporal compression before transmission. This compact semantic representation is transmitted using a lightweight Deep Joint Source-Channel Coding (DeepJSCC) transceiver and explicitly reconstructed at the receiver. Unlike latent-space methods, explicit visual reconstruction enables the direct reuse of pre-trained vision-language models; specifically, a pre-trained BLIP model is employed to infer answers from noisy, reconstructed chrono-images. Experiments on the CLEVRER dataset show that ChronoSC achieves up to 192 times bandwidth reduction compared to raw video transmission while maintaining high VideoQA accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes