CV DB MM IVJul 15, 2020

VidCEP: Complex Event Processing Framework to Detect Spatiotemporal Patterns in Video Streams

arXiv:2007.07817v17.239 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of querying unstructured video data for real-time event detection, which is incremental as it builds on existing CEP systems with new video-specific adaptations.

The authors tackled the problem of detecting spatiotemporal event patterns in video streams by proposing VidCEP, a complex event processing framework that uses a graph-based representation and deep neural networks, achieving F-scores from 0.66 to 0.89 and near real-time performance with 70 frames per second throughput.

Video data is highly expressive and has traditionally been very difficult for a machine to interpret. Querying event patterns from video streams is challenging due to its unstructured representation. Middleware systems such as Complex Event Processing (CEP) mine patterns from data streams and send notifications to users in a timely fashion. Current CEP systems have inherent limitations to query video streams due to their unstructured data model and lack of expressive query language. In this work, we focus on a CEP framework where users can define high-level expressive queries over videos to detect a range of spatiotemporal event patterns. In this context, we propose: i) VidCEP, an in-memory, on the fly, near real-time complex event matching framework for video streams. The system uses a graph-based event representation for video streams which enables the detection of high-level semantic concepts from video using cascades of Deep Neural Network models, ii) a Video Event Query language (VEQL) to express high-level user queries for video streams in CEP, iii) a complex event matcher to detect spatiotemporal video event patterns by matching expressive user queries over video data. The proposed approach detects spatiotemporal video event patterns with an F-score ranging from 0.66 to 0.89. VidCEP maintains near real-time performance with an average throughput of 70 frames per second for 5 parallel videos with sub-second matching latency.

View on arXiv PDF Code

Similar