DBAICVIRMMMay 27, 2025

LazyVLM: Neuro-Symbolic Approach to Video Analytics

arXiv:2505.21459v1h-index: 8
Originality Incremental advance
AI Analysis

This addresses the problem of scalable and user-friendly video analytics for users handling open-domain video data, representing an incremental improvement by combining existing neuro-symbolic and VLM approaches.

The paper tackles the trade-off between flexibility and efficiency in video analytics by introducing LazyVLM, a neuro-symbolic system that enables complex multi-frame queries with a user-friendly interface, achieving scalability by decomposing queries into efficient relational and vector search operations.

Current video analytics approaches face a fundamental trade-off between flexibility and efficiency. End-to-end Vision Language Models (VLMs) often struggle with long-context processing and incur high computational costs, while neural-symbolic methods depend heavily on manual labeling and rigid rule design. In this paper, we introduce LazyVLM, a neuro-symbolic video analytics system that provides a user-friendly query interface similar to VLMs, while addressing their scalability limitation. LazyVLM enables users to effortlessly drop in video data and specify complex multi-frame video queries using a semi-structured text interface for video analytics. To address the scalability limitations of VLMs, LazyVLM decomposes multi-frame video queries into fine-grained operations and offloads the bulk of the processing to efficient relational query execution and vector similarity search. We demonstrate that LazyVLM provides a robust, efficient, and user-friendly solution for querying open-domain video data at scale.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes