TerraGen: A Unified Multi-Task Layout Generation Framework for Remote Sensing Data Augmentation
This addresses the need for spatially controllable data augmentation for multiple remote sensing vision tasks, though it is incremental as it builds on existing layout-to-image methods.
The paper tackles the problem of task-isolated generative data augmentation in remote sensing by proposing TerraGen, a unified layout-to-image framework that achieves the best generation quality across diverse tasks and significantly enhances downstream performance in both full-data and few-shot scenarios.
Remote sensing vision tasks require extensive labeled data across multiple, interconnected domains. However, current generative data augmentation frameworks are task-isolated, i.e., each vision task requires training an independent generative model, and ignores the modeling of geographical information and spatial constraints. To address these issues, we propose \textbf{TerraGen}, a unified layout-to-image generation framework that enables flexible, spatially controllable synthesis of remote sensing imagery for various high-level vision tasks, e.g., detection, segmentation, and extraction. Specifically, TerraGen introduces a geographic-spatial layout encoder that unifies bounding box and segmentation mask inputs, combined with a multi-scale injection scheme and mask-weighted loss to explicitly encode spatial constraints, from global structures to fine details. Also, we construct the first large-scale multi-task remote sensing layout generation dataset containing 45k images and establish a standardized evaluation protocol for this task. Experimental results show that our TerraGen can achieve the best generation image quality across diverse tasks. Additionally, TerraGen can be used as a universal data-augmentation generator, enhancing downstream task performance significantly and demonstrating robust cross-task generalisation in both full-data and few-shot scenarios.