LGFLU-DYNAug 23, 2025

Quantifying Out-of-Training Uncertainty of Neural-Network based Turbulence Closures

arXiv:2508.16891v1h-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses the bottleneck of uncertainty quantification for ML-based turbulence closures, which is crucial for their adoption in computational fluid dynamics, though it is incremental as it compares existing methods on a specific problem.

The paper tackled the problem of quantifying out-of-training uncertainty for neural-network-based turbulence closures in CFD simulations, finding that Gaussian Process (GP) performed best in accuracy with an RMSE of 2.14e-5, while Deep Ensembles (DE) showed robust uncertainty quantification with competitive performance in out-of-training regions.

Neural-Network (NN) based turbulence closures have been developed for being used as pre-trained surrogates for traditional turbulence closures, with the aim to increase computational efficiency and prediction accuracy of CFD simulations. The bottleneck to the widespread adaptation of these ML-based closures is the relative lack of uncertainty quantification (UQ) for these models. Especially, quantifying uncertainties associated with out-of-training inputs, that is when the ML-based turbulence closures are queried on inputs outside their training data regime. In the current paper, a published algebraic turbulence closure1 has been utilized to compare the quality of epistemic UQ between three NN-based methods and Gaussian Process (GP). The three NN-based methods explored are Deep Ensembles (DE), Monte-Carlo Dropout (MCD), and Stochastic Variational Inference (SVI). In the in-training results, we find the exact GP performs the best in accuracy with a Root Mean Squared Error (RMSE) of $2.14 \cdot 10^{-5}$ followed by the DE with an RMSE of $4.59 \cdot 10^{-4}$. Next, the paper discusses the performance of the four methods for quantifying out-of-training uncertainties. For performance, the Exact GP yet again is the best in performance, but has similar performance to the DE in the out-of-training regions. In UQ accuracy for the out-of-training case, SVI and DE hold the best miscalibration error for one of the cases. However, the DE performs the best in Negative Log-Likelihood for both out-of-training cases. We observe that for the current problem, in terms of accuracy GP > DE > SV I > MCD. The DE results are relatively robust and provide intuitive UQ estimates, despite performing naive ensembling. In terms of computational cost, the GP is significantly higher than the NN-based methods with a $O(n^3)$ computational complexity for each training step

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes