AET-EFN: A Versatile Design for Static and Dynamic Event-Based Vision
This work addresses a critical bottleneck in event-based vision for robotics and autonomous systems by enabling universal handling of static and dynamic scenes, though it appears incremental as it builds on existing representation methods.
The authors tackled the challenge of processing noisy, sparse event data from neuromorphic cameras by proposing the Aligned Event Tensor (AET) representation and Event Frame Net (EFN) framework, which outperformed state-of-the-art methods by large margins and achieved the fastest inference speed.
The neuromorphic event cameras, which capture the optical changes of a scene, have drawn increasing attention due to their high speed and low power consumption. However, the event data are noisy, sparse, and nonuniform in the spatial-temporal domain with an extremely high temporal resolution, making it challenging to design backend algorithms for event-based vision. Existing methods encode events into point-cloud-based or voxel-based representations, but suffer from noise and/or information loss. Additionally, there is little research that systematically studies how to handle static and dynamic scenes with one universal design for event-based vision. This work proposes the Aligned Event Tensor (AET) as a novel event data representation, and a neat framework called Event Frame Net (EFN), which enables our model for event-based vision under static and dynamic scenes. The proposed AET and EFN are evaluated on various datasets, and proved to surpass existing state-of-the-art methods by large margins. Our method is also efficient and achieves the fastest inference speed among others.