Silvia Vock

CV
h-index4
6papers
9citations
Novelty38%
AI Score45

6 Papers

CVJun 1
Train, Test, Re-evaluate: Schedule-Sensitive Evaluation of Generative Data for Hand Detection

Atmika Bhardwaj, Silvia Vock, Nico Steckhan

Generated (or synthetic) image data is increasingly used to augment or replace real training datasets when target imagery is scarce, expensive, or biased. For hand detection, particularly in occupational safety settings, public datasets mostly contain bare hands. This under-represents the variation in hand appearance introduced by gloves, tattoos, jewelry, and other personal protective equipment, creating a distribution shift that safety-critical applications encounter at deployment. We test whether generative inpainting, editing only the hand region of a real photograph to introduce accessories, can close this shift gap. On a paired dataset of real images and their synthetic counterparts, we train YOLOv8n hand detectors under six training-and-scheduling regimes (Experiments A-F, three random seeds each), evaluate every detector on a real test set and on a real-gloves-only test split, and report the mean average precision (mAP) at two overlap thresholds (mAP@0.5 and mAP@0.5:0.95) along with paired statistical tests. A two-stage experiment: train on real U synthetic data, then fine-tune the resulting weights on real-only at a lower learning rate, increases mAP@0.5 compared to the real-only baseline model on the standard real test set, and improves the real-gloves out-of-distribution gap. Another three-stage experiment preserves box-tightness best, reaching the highest mAP@0.5:0.95 of any other experiment in the study. The synthetic-data utility for safety-critical hand detection is determined by the training procedure, and simple multi-stage experiments extract substantial real-deployment benefit from inpainted accessory data.

CVMay 26
Semantic Robustness Probing via Inpainting: An Interactive Tool for Safety-Critical Object Detection

Nico Steckhan, Krutarth Prajapati, Weija Shao et al.

Testing object detectors in safety-critical domains requires semantically meaningful probes beyond pixel-level corruptions. We present SemProbe, a tool for semantic robustness probing: users upload deployment images, create masks manually or automatically, select operational design domain-derived factors (or custom prompts), and run diffusion-based controlled inpainting. The system supports batch jobs, parallel seed/workflow variations, and configurable generation parameters. After each output, model inference runs automatically and displays annotated before/after comparisons with performance deltas. All probes are logged as structured artifacts, enabling traceable robustness evidence aligned with safety evaluation workflows. We demonstrate \textsc{SemProbe} on hand detection for dimension saws, targeting factors from insurance-oriented test criteria.

LGJun 27, 2022
Utilizing Class Separation Distance for the Evaluation of Corruption Robustness of Machine Learning Classifiers

Georg Siedel, Silvia Vock, Andrey Morozov et al.

Robustness is a fundamental pillar of Machine Learning (ML) classifiers, substantially determining their reliability. Methods for assessing classifier robustness are therefore essential. In this work, we address the challenge of evaluating corruption robustness in a way that allows comparability and interpretability on a given dataset. We propose a test data augmentation method that uses a robustness distance $ε$ derived from the datasets minimal class separation distance. The resulting MSCR (minimal separation corruption robustness) metric allows a dataset-specific comparison of different classifiers with respect to their corruption robustness. The MSCR value is interpretable, as it represents the classifiers avoidable loss of accuracy due to statistical corruptions. On 2D and image data, we show that the metric reflects different levels of classifier robustness. Furthermore, we observe unexpected optima in classifiers robust accuracy through training and testing classifiers with different levels of noise. While researchers have frequently reported on a significant tradeoff on accuracy when training robust models, we strengthen the view that a tradeoff between accuracy and corruption robustness is not inherent. Our results indicate that robustness training through simple data augmentation can already slightly improve accuracy.

CVDec 17, 2025
Stylized Synthetic Augmentation further improves Corruption Robustness

Georg Siedel, Rojan Regmi, Abhirami Anand et al.

This paper proposes a training data augmentation pipeline that combines synthetic image data with neural style transfer in order to address the vulnerability of deep vision models to common corruptions. We show that although applying style transfer on synthetic images degrades their quality with respect to the common Frechet Inception Distance (FID) metric, these images are surprisingly beneficial for model training. We conduct a systematic empirical analysis of the effects of both augmentations and their key hyperparameters on the performance of image classifiers. Our results demonstrate that stylization and synthetic data complement each other well and can be combined with popular rule-based data augmentation techniques such as TrivialAugment, while not working with others. Our method achieves state-of-the-art corruption robustness on several small-scale image classification benchmarks, reaching 93.54%, 74.9% and 50.86% robust accuracy on CIFAR-10-C, CIFAR-100-C and TinyImageNet-C, respectively

CVJul 22, 2025
Combined Image Data Augmentations diminish the benefits of Adaptive Label Smoothing

Georg Siedel, Ekagra Gupta, Weijia Shao et al.

Soft augmentation regularizes the supervised learning process of image classifiers by reducing label confidence of a training sample based on the magnitude of random-crop augmentation applied to it. This paper extends this adaptive label smoothing framework to other types of aggressive augmentations beyond random-crop. Specifically, we demonstrate the effectiveness of the method for random erasing and noise injection data augmentation. Adaptive label smoothing permits stronger regularization via higher-intensity Random Erasing. However, its benefits vanish when applied with a diverse range of image transformations as in the state-of-the-art TrivialAugment method, and excessive label smoothing harms robustness to common corruptions. Our findings suggest that adaptive label smoothing should only be applied when the training data distribution is dominated by a limited, homogeneous set of image transformation types.

LGMay 9, 2023
Investigating the Corruption Robustness of Image Classifiers with Random Lp-norm Corruptions

Georg Siedel, Weijia Shao, Silvia Vock et al.

Robustness is a fundamental property of machine learning classifiers required to achieve safety and reliability. In the field of adversarial robustness of image classifiers, robustness is commonly defined as the stability of a model to all input changes within a p-norm distance. However, in the field of random corruption robustness, variations observed in the real world are used, while p-norm corruptions are rarely considered. This study investigates the use of random p-norm corruptions to augment the training and test data of image classifiers. We evaluate the model robustness against imperceptible random p-norm corruptions and propose a novel robustness metric. We empirically investigate whether robustness transfers across different p-norms and derive conclusions on which p-norm corruptions a model should be trained and evaluated. We find that training data augmentation with a combination of p-norm corruptions significantly improves corruption robustness, even on top of state-of-the-art data augmentation schemes.