CVFeb 1

MTC-VAE: Multi-Level Temporal Compression with Content Awareness

arXiv:2602.01340v1
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in video compression for AI models, offering an incremental improvement to enhance the efficiency of Latent Video Diffusion Models.

The paper tackles the performance decline in Variational Autoencoders (VAEs) for video compression when increasing compression rates, by introducing a method to convert fixed-rate VAEs into multi-level temporal compression models with minimal fine-tuning, achieving improved efficiency without expanding hidden channels.

Latent Video Diffusion Models (LVDMs) rely on Variational Autoencoders (VAEs) to compress videos into compact latent representations. For continuous Variational Autoencoders (VAEs), achieving higher compression rates is desirable; yet, the efficiency notably declines when extra sampling layers are added without expanding the dimensions of hidden channels. In this paper, we present a technique to convert fixed compression rate VAEs into models that support multi-level temporal compression, providing a straightforward and minimal fine-tuning approach to counteract performance decline at elevated compression rates.Moreover, we examine how varying compression levels impact model performance over video segments with diverse characteristics, offering empirical evidence on the effectiveness of our proposed approach. We also investigate the integration of our multi-level temporal compression VAE with diffusion-based generative models, DiT, highlighting successful concurrent training and compatibility within these frameworks. This investigation illustrates the potential uses of multi-level temporal compression.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes