CVOct 11, 2025

A Style-Based Profiling Framework for Quantifying the Synthetic-to-Real Gap in Autonomous Driving Datasets

arXiv:2510.10203v2h-index: 4
Originality Incremental advance
AI Analysis

This work addresses the challenge of model generalization for autonomous driving perception systems by providing a data-centric tool to improve synthetic dataset quality, though it is incremental as it builds on existing style extraction and metric learning techniques.

The paper tackles the problem of the domain gap between synthetic and real-world datasets in autonomous driving by introducing a style-based profiling framework with a novel metric called Style Embedding Distribution Discrepancy (SEDD), which quantifies this gap and enables systematic diagnosis and enhancement of synthetic datasets.

Ensuring the reliability of autonomous driving perception systems requires extensive environment-based testing, yet real-world execution is often impractical. Synthetic datasets have therefore emerged as a promising alternative, offering advantages such as cost-effectiveness, bias free labeling, and controllable scenarios. However, the domain gap between synthetic and real-world datasets remains a major obstacle to model generalization. To address this challenge from a data-centric perspective, this paper introduces a profile extraction and discovery framework for characterizing the style profiles underlying both synthetic and real image datasets. We propose Style Embedding Distribution Discrepancy (SEDD) as a novel evaluation metric. Our framework combines Gram matrix-based style extraction with metric learning optimized for intra-class compactness and inter-class separation to extract style embeddings. Furthermore, we establish a benchmark using publicly available datasets. Experiments are conducted on a variety of datasets and sim-to-real methods, and the results show that our method is capable of quantifying the synthetic-to-real gap. This work provides a standardized profiling-based quality control paradigm that enables systematic diagnosis and targeted enhancement of synthetic datasets, advancing future development of data-driven autonomous driving systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes