CVSep 18, 2024

Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression

arXiv:2409.11718v216 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the need for more efficient video compression for analysis tasks, but it is incremental as it builds on existing methods by incorporating off-the-shelf models.

The paper tackles the problem of limited semantic richness in unsupervised video semantic compression by leveraging semantics from visual foundation models, achieving improved performance on three mainstream tasks and six datasets.

Unsupervised video semantic compression (UVSC), i.e., compressing videos to better support various analysis tasks, has recently garnered attention. However, the semantic richness of previous methods remains limited, due to the single semantic learning objective, limited training data, etc. To address this, we propose to boost the UVSC task by absorbing the off-the-shelf rich semantics from VFMs. Specifically, we introduce a VFMs-shared semantic alignment layer, complemented by VFM-specific prompts, to flexibly align semantics between the compressed video and various VFMs. This allows different VFMs to collaboratively build a mutually-enhanced semantic space, guiding the learning of the compression model. Moreover, we introduce a dynamic trajectory-based inter-frame compression scheme, which first estimates the semantic trajectory based on the historical content, and then traverses along the trajectory to predict the future semantics as the coding context. This reduces the overall bitcost of the system, further improving the compression efficiency. Our approach outperforms previous coding methods on three mainstream tasks and six datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes