MM LGDec 18, 2024

FlexCache: Flexible Approximate Cache System for Video Diffusion

Desen Sun, Henry Tian, Tim Lu, Sihang Liu

arXiv:2501.04012v17 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses the slow generation times in text-to-video applications, though it is an incremental improvement focused on caching optimization.

The paper tackles the high computational cost of video diffusion models by introducing FlexCache, a flexible approximate cache system that reduces cache storage by 6.7× on average and achieves 1.26× higher throughput with 25% lower cost compared to state-of-the-art methods.

Text-to-Video applications receive increasing attention from the public. Among these, diffusion models have emerged as the most prominent approach, offering impressive quality in visual content generation. However, it still suffers from substantial computational complexity, often requiring several minutes to generate a single video. While prior research has addressed the computational overhead in text-to-image diffusion models, the techniques developed are not directly suitable for video diffusion models due to the significantly larger cache requirements and enhanced computational demands associated with video generation. We present FlexCache, a flexible approximate cache system that addresses the challenges in two main designs. First, we compress the caches before saving them to storage. Our compression strategy can reduce 6.7 times consumption on average. Then we find that the approximate cache system can achieve higher hit rate and computation savings by decoupling the object and background. We further design a tailored cache replacement policy to support the two techniques mentioned above better. Through our evaluation, FlexCache reaches 1.26 times higher throughput and 25% lower cost compared to the state-of-the-art diffusion approximate cache system.

View on arXiv PDF

Similar