Diversifying Human Pose in Synthetic Data for Aerial-view Human Detection
This addresses the data scarcity problem for researchers and practitioners in aerial-view human detection, though it is an incremental improvement on existing synthetic data methods.
The paper tackles the challenge of diversifying human poses in synthetic data for aerial-view human detection by proposing SynPoseDiv, a framework that generates realistic 3D poses and translates images to new poses, resulting in significant improvements in detection accuracy across multiple benchmarks, particularly in low-shot scenarios.
Synthetic data generation has emerged as a promising solution to the data scarcity issue in aerial-view human detection. However, creating datasets that accurately reflect varying real-world human appearances, particularly diverse poses, remains challenging and labor-intensive. To address this, we propose SynPoseDiv, a novel framework that diversifies human poses within existing synthetic datasets. SynPoseDiv tackles two key challenges: generating realistic, diverse 3D human poses using a diffusion-based pose generator, and producing images of virtual characters in novel poses through a source-to-target image translator. The framework incrementally transitions characters into new poses using optimized pose sequences identified via Dijkstra's algorithm. Experiments demonstrate that SynPoseDiv significantly improves detection accuracy across multiple aerial-view human detection benchmarks, especially in low-shot scenarios, and remains effective regardless of the training approach or dataset size.