CVMay 22, 2025

V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation

arXiv:2505.16797v12 citationsh-index: 13Has Code
Originality Incremental advance
AI Analysis

This addresses the data scarcity and storage inefficiency limiting event-based vision models, enabling larger-scale training for improved generalization in applications like robotics and autonomous systems.

The paper tackled the problem of scaling event-based vision training datasets by introducing Video-to-Voxel (V2V), which directly converts video frames into event-based voxel grids, resulting in a 150 times reduction in storage requirements and enabling training on 10,000 videos totaling 52 hours, leading to substantial improvements in video reconstruction and optical flow estimation.

Event-based cameras offer unique advantages such as high temporal resolution, high dynamic range, and low power consumption. However, the massive storage requirements and I/O burdens of existing synthetic data generation pipelines and the scarcity of real data prevent event-based training datasets from scaling up, limiting the development and generalization capabilities of event vision models. To address this challenge, we introduce Video-to-Voxel (V2V), an approach that directly converts conventional video frames into event-based voxel grid representations, bypassing the storage-intensive event stream generation entirely. V2V enables a 150 times reduction in storage requirements while supporting on-the-fly parameter randomization for enhanced model robustness. Leveraging this efficiency, we train several video reconstruction and optical flow estimation model architectures on 10,000 diverse videos totaling 52 hours--an order of magnitude larger than existing event datasets, yielding substantial improvements.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes