CVNov 28, 2025

Synthetic Industrial Object Detection: GenAI vs. Feature-Based Methods

Jose Moises Araya-Martinez, Adrián Sanchis Reig, Gautham Mohan, Sarvenaz Sardari, Jens Lambrecht, Jörg Krüger

arXiv:2511.23241v16.21 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of reducing data annotation costs for industrial and robotics applications, offering incremental improvements by comparing existing methods.

The paper tackled the challenge of generating synthetic data for industrial object detection by benchmarking domain randomization and adaptation techniques, finding that simpler feature-based methods like perceptual hashing outperformed generative AI approaches in accuracy and efficiency, achieving mAP50 scores of 98% and 67% on industrial and robotics datasets.

Reducing the burden of data generation and annotation remains a major challenge for the cost-effective deployment of machine learning in industrial and robotics settings. While synthetic rendering is a promising solution, bridging the sim-to-real gap often requires expert intervention. In this work, we benchmark a range of domain randomization (DR) and domain adaptation (DA) techniques, including feature-based methods, generative AI (GenAI), and classical rendering approaches, for creating contextualized synthetic data without manual annotation. Our evaluation focuses on the effectiveness and efficiency of low-level and high-level feature alignment, as well as a controlled diffusion-based DA method guided by prompts generated from real-world contexts. We validate our methods on two datasets: a proprietary industrial dataset (automotive and logistics) and a public robotics dataset. Results show that if render-based data with enough variability is available as seed, simpler feature-based methods, such as brightness-based and perceptual hashing filtering, outperform more complex GenAI-based approaches in both accuracy and resource efficiency. Perceptual hashing consistently achieves the highest performance, with mAP50 scores of 98% and 67% on the industrial and robotics datasets, respectively. Additionally, GenAI methods present significant time overhead for data generation at no apparent improvement of sim-to-real mAP values compared to simpler methods. Our findings offer actionable insights for efficiently bridging the sim-to-real gap, enabling high real-world performance from models trained exclusively on synthetic data.

View on arXiv PDF

Similar