CVOct 24, 2025

TerraGen: A Unified Multi-Task Layout Generation Framework for Remote Sensing Data Augmentation

Datao Tang, Hao Wang, Yudeng Xin, Hui Qiao, Dongsheng Jiang, Yin Li, Zhiheng Yu, Xiangyong Cao

arXiv:2510.21391v13 citationsh-index: 3

Originality Incremental advance

AI Analysis

This addresses the need for spatially controllable data augmentation for multiple remote sensing vision tasks, though it is incremental as it builds on existing layout-to-image methods.

The paper tackles the problem of task-isolated generative data augmentation in remote sensing by proposing TerraGen, a unified layout-to-image framework that achieves the best generation quality across diverse tasks and significantly enhances downstream performance in both full-data and few-shot scenarios.

Remote sensing vision tasks require extensive labeled data across multiple, interconnected domains. However, current generative data augmentation frameworks are task-isolated, i.e., each vision task requires training an independent generative model, and ignores the modeling of geographical information and spatial constraints. To address these issues, we propose \textbf{TerraGen}, a unified layout-to-image generation framework that enables flexible, spatially controllable synthesis of remote sensing imagery for various high-level vision tasks, e.g., detection, segmentation, and extraction. Specifically, TerraGen introduces a geographic-spatial layout encoder that unifies bounding box and segmentation mask inputs, combined with a multi-scale injection scheme and mask-weighted loss to explicitly encode spatial constraints, from global structures to fine details. Also, we construct the first large-scale multi-task remote sensing layout generation dataset containing 45k images and establish a standardized evaluation protocol for this task. Experimental results show that our TerraGen can achieve the best generation image quality across diverse tasks. Additionally, TerraGen can be used as a universal data-augmentation generator, enhancing downstream task performance significantly and demonstrating robust cross-task generalisation in both full-data and few-shot scenarios.

View on arXiv PDF

Similar