MM CVAug 12, 2024

Rethinking Video with a Universal Event-Based Representation

arXiv:2408.06248v12 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses computational redundancy in video systems for applications like large-scale surveillance and resource-constrained sensing, though it appears incremental as it builds on existing event camera paradigms.

The paper tackles the inefficiency of traditional video systems and fragmented event camera representations by introducing ADΔER, a universal event-based representation that achieves state-of-the-art application speed and compression performance for scenes with high temporal redundancy.

Traditionally, video is structured as a sequence of discrete image frames. Recently, however, a novel video sensing paradigm has emerged which eschews video frames entirely. These "event" sensors aim to mimic the human vision system with asynchronous sensing, where each pixel has an independent, sparse data stream. While these cameras enable high-speed and high-dynamic-range sensing, researchers often revert to a framed representation of the event data for existing applications, or build bespoke applications for a particular camera's event data type. At the same time, classical video systems have significant computational redundancy at the application layer, since pixel samples are repeated across frames in the uncompressed domain. To address the shortcomings of existing systems, I introduce Address, Decimation, Δt Event Representation (ADΔER, pronounced "adder"), a novel intermediate video representation and system framework. The framework transcodes a variety of framed and event camera sources into a single event-based representation, which supports source-modeled lossy compression and backward compatibility with traditional frame-based applications. I demonstrate that ADΔER achieves state-of-the-art application speed and compression performance for scenes with high temporal redundancy. Crucially, I describe how ADΔER unlocks an entirely new control mechanism for computer vision: application speed can correlate with both the scene content and the level of lossy compression. Finally, I discuss the implications for event-based video on large-scale video surveillance and resource-constrained sensing.

View on arXiv PDF

Similar