OASIS: On-Demand Hierarchical Event Memory for Streaming Video Reasoning
For researchers working on streaming video reasoning, OASIS addresses the bottleneck of retrieving relevant information from unbounded history without increasing memory or compression, offering a practical solution that improves accuracy and efficiency.
OASIS introduces a training-free, plug-and-play framework for streaming video reasoning that organizes history into hierarchical events and performs controlled refinement, achieving strong gains in long-horizon accuracy and compositional reasoning with bounded token cost and low request delay.
Streaming video reasoning requires models to operate in a setting where history grows without bound while meaningful evidence remains scarce. In such a landscape, relevant signal is like an oasis-small, critical, and easily lost in a desert of redundancy. Enlarging memory only widens the desert; aggressive compression dries up the oasis. The real difficulty lies in discovering where to look, not how much to remember. We therefore introduce OASIS, a novel framework for streaming video reasoning that tackles this challenge through structured, on-demand retrieval. It organizes streaming history into hierarchical events and performs reasoning as controlled refinement-short-context inference first, followed by semantically grounded retrieval only when uncertainty arises. As the retrieval is driven by high-level intent rather than embedding similarity, the retrieved memory is substantially more accurate and less noisy. Additionally, the mechanism is plug-and-play, training-free, and readily attaches to different streaming MLLM backbones. Experiments across multiple benchmarks and backbones show that OASIS achieves strong gains in long-horizon accuracy and compositional reasoning with bounded token cost and low request delay. Code is available at https://github.com/Solus-sano/OASIS.