CVSep 26, 2025

ControlEvents: Controllable Synthesis of Event Camera Datawith Foundational Prior from Image Diffusion Models

Yixuan Hu, Yuxuan Xue, Simon Klenk, Daniel Cremers, Gerard Pons-Moll

arXiv:2509.22864v16.21 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses the costly and limited data availability problem for researchers and practitioners in event-based vision, offering a method to generate labeled event datasets more efficiently, though it is incremental as it builds on existing diffusion models.

The paper tackles the challenge of obtaining large-scale labeled ground-truth data for event-based vision tasks by presenting ControlEvents, a diffusion-based generative model that synthesizes high-quality event data guided by control signals like text labels, 2D skeletons, and 3D body poses, resulting in enhanced model performance across tasks such as visual recognition, 2D skeleton estimation, and 3D body pose estimation.

In recent years, event cameras have gained significant attention due to their bio-inspired properties, such as high temporal resolution and high dynamic range. However, obtaining large-scale labeled ground-truth data for event-based vision tasks remains challenging and costly. In this paper, we present ControlEvents, a diffusion-based generative model designed to synthesize high-quality event data guided by diverse control signals such as class text labels, 2D skeletons, and 3D body poses. Our key insight is to leverage the diffusion prior from foundation models, such as Stable Diffusion, enabling high-quality event data generation with minimal fine-tuning and limited labeled data. Our method streamlines the data generation process and significantly reduces the cost of producing labeled event datasets. We demonstrate the effectiveness of our approach by synthesizing event data for visual recognition, 2D skeleton estimation, and 3D body pose estimation. Our experiments show that the synthesized labeled event data enhances model performance in all tasks. Additionally, our approach can generate events based on unseen text labels during training, illustrating the powerful text-based generation capabilities inherited from foundation models.

View on arXiv PDF

Similar