CVMar 27, 2025

AGILE: A Diffusion-Based Attention-Guided Image and Label Translation for Efficient Cross-Domain Plant Trait Identification

Earl Ranario, Lars Lundqvist, Heesup Yun, Brian N. Bailey, J. Mason Earles

arXiv:2503.22019v12 citationsh-index: 72025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Originality Incremental advance

AI Analysis

This work addresses the challenge of generating training data for plant trait identification in agriculture, particularly when domain gaps are significant, though it appears incremental as it builds on existing diffusion models.

The paper tackled the problem of maintaining object-level accuracy in cross-domain image translation for plant trait identification by introducing AGILE, a diffusion-based framework that uses optimized text embeddings and attention guidance, resulting in superior semantic alignment and enhanced object detection performance compared to prior methods.

Semantically consistent cross-domain image translation facilitates the generation of training data by transferring labels across different domains, making it particularly useful for plant trait identification in agriculture. However, existing generative models struggle to maintain object-level accuracy when translating images between domains, especially when domain gaps are significant. In this work, we introduce AGILE (Attention-Guided Image and Label Translation for Efficient Cross-Domain Plant Trait Identification), a diffusion-based framework that leverages optimized text embeddings and attention guidance to semantically constrain image translation. AGILE utilizes pretrained diffusion models and publicly available agricultural datasets to improve the fidelity of translated images while preserving critical object semantics. Our approach optimizes text embeddings to strengthen the correspondence between source and target images and guides attention maps during the denoising process to control object placement. We evaluate AGILE on cross-domain plant datasets and demonstrate its effectiveness in generating semantically accurate translated images. Quantitative experiments show that AGILE enhances object detection performance in the target domain while maintaining realism and consistency. Compared to prior image translation methods, AGILE achieves superior semantic alignment, particularly in challenging cases where objects vary significantly or domain gaps are substantial.

View on arXiv PDF

Similar