Taming Contrast Maximization for Learning Sequential, Low-latency, Event-based Optical Flow
This addresses the need for low-latency, low-power computer vision solutions using event cameras, representing a novel method rather than an incremental improvement.
The paper tackles the problem of estimating optical flow from event cameras by proposing a self-supervised learning pipeline that scales to high inference frequencies, achieving state-of-the-art accuracy without ground truth across multiple datasets.
Event cameras have recently gained significant traction since they open up new avenues for low-latency and low-power solutions to complex computer vision problems. To unlock these solutions, it is necessary to develop algorithms that can leverage the unique nature of event data. However, the current state-of-the-art is still highly influenced by the frame-based literature, and usually fails to deliver on these promises. In this work, we take this into consideration and propose a novel self-supervised learning pipeline for the sequential estimation of event-based optical flow that allows for the scaling of the models to high inference frequencies. At its core, we have a continuously-running stateful neural model that is trained using a novel formulation of contrast maximization that makes it robust to nonlinearities and varying statistics in the input events. Results across multiple datasets confirm the effectiveness of our method, which establishes a new state of the art in terms of accuracy for approaches trained or optimized without ground truth.