LGSEJun 3

Testing Neural Networks via Bayesian-Guided Exploration of Decision Landscapes

arXiv:2606.0431447.5
AI Analysis

For developers of safety-critical neural networks, BayesWarp offers a more effective testing method to uncover diverse failures while preserving data semantics.

BayesWarp improves neural network testing by using Bayesian optimization to guide mutations of decision-critical regions, achieving better failure discovery and diversity across MNIST, CIFAR-10, and ImageNet models.

As neural networks are increasingly deployed in safety-critical domains, testing is essential to evaluate and improve their reliability. Existing testing methods, whether black-box or white-box, primarily use global mutation or coverage-guided strategies, both of which struggle to efficiently uncover diverse model failures while remaining proximate to the original data distribution and semantics. We propose BayesWarp, a testing framework that addresses this limitation by mutating decision-critical input regions identified via interpretable saliency techniques and adaptively guiding the testing process using an uncertainty-aware Bayesian Optimization strategy, enabling the discovery of diverse failures while preserving distributional and semantic proximity to the original data. Evaluation on MNIST, CIFAR-10, and ImageNet across six neural network models shows that BayesWarp improves failure discovery, failure diversity, test case quality, and critical neuron coverage under a fixed mutation budget. These results demonstrate that BayesWarp improves testing effectiveness. Moreover, fine-tuning with the generated failure cases leads to improvements in model performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes