Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection
This work addresses a key problem in computer vision for researchers and practitioners by providing an incremental enhancement to data augmentation techniques for object detection.
The paper tackles the challenge of balancing dataset diversity with semantic coordination in data augmentation for object detection by introducing a diffusion-based method that uses a Category Affinity Matrix and Surrounding Region Alignment. The result is substantial average improvements of +1.4AP, +0.9AP, and +3.4AP over existing alternatives on three object detection models.
Recent studies emphasize the crucial role of data augmentation in enhancing the performance of object detection models. However,existing methodologies often struggle to effectively harmonize dataset diversity with semantic coordination.To bridge this gap, we introduce an innovative augmentation technique leveraging pre-trained conditional diffusion models to mediate this balance. Our approach encompasses the development of a Category Affinity Matrix, meticulously designed to enhance dataset diversity, and a Surrounding Region Alignment strategy, which ensures the preservation of semantic coordination in the augmented images. Extensive experimental evaluations confirm the efficacy of our method in enriching dataset diversity while seamlessly maintaining semantic coordination. Our method yields substantial average improvements of +1.4AP, +0.9AP, and +3.4AP over existing alternatives on three distinct object detection models, respectively.