CVAICLMay 8, 2025

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

arXiv:2505.05467v233 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work solves the problem of enabling real-time, interactive video understanding for users of video AI systems, representing an incremental advancement by enhancing existing models with streaming features.

The paper tackles the problem of adapting offline Video-LLMs for streaming scenarios by addressing challenges in multi-turn real-time understanding and proactive responses, resulting in significant improvements in streaming capabilities that outperform proprietary models like GPT-4o and Gemini 1.5 Pro.

We present StreamBridge, a simple yet effective framework that seamlessly transforms offline Video-LLMs into streaming-capable models. It addresses two fundamental challenges in adapting existing models into online scenarios: (1) limited capability for multi-turn real-time understanding, and (2) lack of proactive response mechanisms. Specifically, StreamBridge incorporates (1) a memory buffer combined with a round-decayed compression strategy, supporting long-context multi-turn interactions, and (2) a decoupled, lightweight activation model that can be effortlessly integrated into existing Video-LLMs, enabling continuous proactive responses. To further support StreamBridge, we construct Stream-IT, a large-scale dataset tailored for streaming video understanding, featuring interleaved video-text sequences and diverse instruction formats. Extensive experiments show that StreamBridge significantly improves the streaming understanding capabilities of offline Video-LLMs across various tasks, outperforming even proprietary models such as GPT-4o and Gemini 1.5 Pro. Simultaneously, it achieves competitive or superior performance on standard video understanding benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes