CVApr 10

StreamMeCo: Long-Term Agent Memory Compression for Efficient Streaming Video Understanding

Junxi Wang, Te Sun, Jiayi Zhu, Junxian Li, Haowen Xu, Zichen Wen, Xuming Hu, Zhiyu Li, Linfeng Zhang

arXiv:2604.0900090.44 citationsHas Code

AI Analysis

This work addresses storage and computation costs for vision agents in streaming video applications, representing an incremental improvement in efficiency.

The paper tackles the high memory overhead in streaming video understanding by proposing StreamMeCo, a memory compression framework that achieves a 1.87x speedup in memory retrieval and a 1.0% average accuracy improvement under 70% compression.

Vision agent memory has shown remarkable effectiveness in streaming video understanding. However, storing such memory for videos incurs substantial memory overhead, leading to high costs in both storage and computation. To address this issue, we propose StreamMeCo, an efficient Stream Agent Memory Compression framework. Specifically, based on the connectivity of the memory graph, StreamMeCo introduces edge-free minmax sampling for the isolated nodes and an edge-aware weight pruning for connected nodes, evicting the redundant memory nodes while maintaining the accuracy. In addition, we introduce a time-decay memory retrieval mechanism to further eliminate the performance degradation caused by memory compression. Extensive experiments on three challenging benchmark datasets (M3-Bench-robot, M3-Bench-web and Video-MME-Long) demonstrate that under 70% memory graph compression, StreamMeCo achieves a 1.87* speedup in memory retrieval while delivering an average accuracy improvement of 1.0%. Our code is available at https://github.com/Celina-love-sweet/StreamMeCo.

View on arXiv PDF Code

Similar