CVApr 3

Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection

arXiv:2604.0296620.4Has Code
Predicted impact top 33% in CV · last 90 daysOriginality Highly original
AI Analysis

This work addresses the challenge of limited training data for object detection in UAV applications, which is incremental as it builds on existing layout-to-image generation methods.

The paper tackles the problem of generating high-fidelity synthetic images for UAV-based object detection to address limited annotated data, resulting in a method that significantly outperforms state-of-the-art approaches and consistently improves detection accuracy.

Unmanned aerial vehicle (UAV) based object detection is a critical but challenging task, when applied in dynamically changing scenarios with limited annotated training data. Layout-to-image generation approaches have proved effective in promoting detection accuracy by synthesizing labeled images based on diffusion models. However, they suffer from frequently producing artifacts, especially near layout boundaries of tiny objects, thus substantially limiting their performance. To address these issues, we propose UAVGen, a novel layout-to-image generation framework tailored for UAV-based object detection. Specifically, UAVGen designs a Visual Prototype Conditioned Diffusion Model (VPC-DM) that constructs representative instances for each class and integrates them into latent embeddings for high-fidelity object generation. Moreover, a Focal Region Enhanced Data Pipeline (FRE-DP) is introduced to emphasize object-concentrated foreground regions in synthesis, combined with a label refinement to correct missing, extra and misaligned generations. Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art approaches, and consistently promotes accuracy when integrated with distinct detectors. The source code is available at https://github.com/Sirius-Li/UAVGen.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes