Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference
This work addresses the challenge of efficient video analytics on edge devices, offering a novel system-level optimization that is incremental but provides significant performance gains for real-time applications.
The paper tackles the problem of enabling high-performance object detection on high-resolution video streams on resource-constrained edge devices by introducing Mondrian, a system that uses Compressive Packed Inference to selectively process pixels and maximize accelerator utilization, resulting in 15.0-19.7% higher accuracy and 6.65x higher throughput compared to state-of-the-art baselines.
In this paper, we present Mondrian, an edge system that enables high-performance object detection on high-resolution video streams. Many lightweight models and system optimization techniques have been proposed for resource-constrained devices, but they do not fully utilize the potential of the accelerators over dynamic, high-resolution videos. To enable such capability, we devise a novel Compressive Packed Inference to minimize per-pixel processing costs by selectively determining the necessary pixels to process and combining them to maximize processing parallelism. In particular, our system quickly extracts ROIs and dynamically shrinks them, reflecting the effect of the fast-changing characteristics of objects and scenes. It then intelligently combines such scaled ROIs into large canvases to maximize the utilization of inference accelerators such as GPU. Evaluation across various datasets, models, and devices shows Mondrian outperforms state-of-the-art baselines (e.g., input rescaling, ROI extractions, ROI extractions+batching) by 15.0-19.7% higher accuracy, leading to $\times$6.65 higher throughput than frame-wise inference for processing various 1080p video streams. We will release the code after the paper review.