Representation Learning on Event Stream via an Elastic Net-incorporated Tensor Network
This addresses the limitation of existing methods that aggregate events locally for applications like noise filtering in event-based vision.
The paper tackles the problem of processing event camera data by developing a tensor decomposition method that captures global spatiotemporal correlations in event streams, achieving effective noise filtering results compared to state-of-the-art methods.
Event cameras are neuromorphic sensors that capture asynchronous and sparse event stream when per-pixel brightness changes. The state-of-the-art processing methods for event signals typically aggregate events into a frame or a grid. However, events are dense in time, these works are limited to local information of events due to the stacking. In this paper, we present a novel spatiotemporal representation learning method which can capture the global correlations of all events in the event stream simultaneously by tensor decomposition. In addition, with the events are sparse in space, we propose an Elastic Net-incorporated tensor network (ENTN) model to obtain more spatial and temporal details about event stream. Empirically, the results indicate that our method can represent the spatiotemporal correlation of events with high quality, and can achieve effective results in applications like filtering noise compared with the state-of-the-art methods.