CVFeb 24Code
SynthRender and IRIS: Open-Source Framework and Dataset for Bidirectional Sim-Real Transfer in Industrial Object PerceptionJose Moises Araya-Martinez, Thushar Tom, Adrián Sanchis Reig et al.
Object perception is fundamental for tasks such as robotic material handling and quality inspection. However, modern supervised deep-learning perception models require large datasets for robust automation under semi-uncontrolled conditions. The cost of acquiring and annotating such data for proprietary parts is a major barrier for widespread deployment. In this context, we release SynthRender, an open source framework for synthetic image generation with Guided Domain Randomization capabilities. Furthermore, we benchmark recent Reality-to-Simulation techniques for 3D asset creation from 2D images of real parts. Combined with Domain Randomization, these synthetic assets provide low-overhead, transferable data even for parts lacking 3D files. We also introduce IRIS, the Industrial Real-Sim Imagery Set, containing 32 categories with diverse textures, intra-class variation, strong inter-class similarities and about 20,000 labels. Ablations on multiple benchmarks outline guidelines for efficient data generation with SynthRender. Our method surpasses existing approaches, achieving 99.1% mAP@50 on a public robotics dataset, 98.3% mAP@50 on an automotive benchmark, and 95.3% mAP@50 on IRIS.
CVNov 28, 2025
Synthetic Industrial Object Detection: GenAI vs. Feature-Based MethodsJose Moises Araya-Martinez, Adrián Sanchis Reig, Gautham Mohan et al.
Reducing the burden of data generation and annotation remains a major challenge for the cost-effective deployment of machine learning in industrial and robotics settings. While synthetic rendering is a promising solution, bridging the sim-to-real gap often requires expert intervention. In this work, we benchmark a range of domain randomization (DR) and domain adaptation (DA) techniques, including feature-based methods, generative AI (GenAI), and classical rendering approaches, for creating contextualized synthetic data without manual annotation. Our evaluation focuses on the effectiveness and efficiency of low-level and high-level feature alignment, as well as a controlled diffusion-based DA method guided by prompts generated from real-world contexts. We validate our methods on two datasets: a proprietary industrial dataset (automotive and logistics) and a public robotics dataset. Results show that if render-based data with enough variability is available as seed, simpler feature-based methods, such as brightness-based and perceptual hashing filtering, outperform more complex GenAI-based approaches in both accuracy and resource efficiency. Perceptual hashing consistently achieves the highest performance, with mAP50 scores of 98% and 67% on the industrial and robotics datasets, respectively. Additionally, GenAI methods present significant time overhead for data generation at no apparent improvement of sim-to-real mAP values compared to simpler methods. Our findings offer actionable insights for efficiently bridging the sim-to-real gap, enabling high real-world performance from models trained exclusively on synthetic data.
CVNov 28, 2025
Zero-Shot Multi-Criteria Visual Quality Inspection for Semi-Controlled Industrial Environments via Real-Time 3D Digital Twin SimulationJose Moises Araya-Martinez, Gautham Mohan, Kenichi Hayakawa Bolaños et al.
Early-stage visual quality inspection is vital for achieving Zero-Defect Manufacturing and minimizing production waste in modern industrial environments. However, the complexity of robust visual inspection systems and their extensive data requirements hinder widespread adoption in semi-controlled industrial settings. In this context, we propose a pose-agnostic, zero-shot quality inspection framework that compares real scenes against real-time Digital Twins (DT) in the RGB-D space. Our approach enables efficient real-time DT rendering by semantically describing industrial scenes through object detection and pose estimation of known Computer-Aided Design models. We benchmark tools for real-time, multimodal RGB-D DT creation while tracking consumption of computational resources. Additionally, we provide an extensible and hierarchical annotation strategy for multi-criteria defect detection, unifying pose labelling with logical and structural defect annotations. Based on an automotive use case featuring the quality inspection of an axial flux motor, we demonstrate the effectiveness of our framework. Our results demonstrate detection performace, achieving intersection-over-union (IoU) scores of up to 63.3% compared to ground-truth masks, even if using simple distance measurements under semi-controlled industrial conditions. Our findings lay the groundwork for future research on generalizable, low-data defect detection methods in dynamic manufacturing settings.