LGSep 15, 2025

Diffusion-Based Generation and Imputation of Driving Scenarios from Limited Vehicle CAN Data

arXiv:2509.12375v1h-index: 4
Originality Incremental advance
AI Analysis

This work addresses data scarcity and quality issues in automotive AI applications, offering a domain-specific solution for generating and improving driving scenario data.

The paper tackled the challenge of generating realistic synthetic automotive time series data from limited and corrupted vehicle CAN datasets, achieving results where their best model outperformed the training data in physical correctness and successfully imputed implausible regions.

Training deep learning methods on small time series datasets that also include corrupted samples is challenging. Diffusion models have shown to be effective to generate realistic and synthetic data, and correct corrupted samples through imputation. In this context, this paper focuses on generating synthetic yet realistic samples of automotive time series data. We show that denoising diffusion probabilistic models (DDPMs) can effectively solve this task by applying them to a challenging vehicle CAN-dataset with long-term data and a limited number of samples. Therefore, we propose a hybrid generative approach that combines autoregressive and non-autoregressive techniques. We evaluate our approach with two recently proposed DDPM architectures for time series generation, for which we propose several improvements. To evaluate the generated samples, we propose three metrics that quantify physical correctness and test track adherence. Our best model is able to outperform even the training data in terms of physical correctness, while showing plausible driving behavior. Finally, we use our best model to successfully impute physically implausible regions in the training data, thereby improving the data quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes