LGSep 23, 2022
Jensen-Shannon Divergence Based Novel Loss Functions for Bayesian Neural NetworksPonkrshnan Thiagarajan, Susanta Ghosh
Bayesian neural networks (BNNs) are state-of-the-art machine learning methods that can naturally regularize and systematically quantify uncertainties using their stochastic parameters. Kullback-Leibler (KL) divergence-based variational inference used in BNNs suffers from unstable optimization and challenges in approximating light-tailed posteriors due to the unbounded nature of the KL divergence. To resolve these issues, we formulate a novel loss function for BNNs based on a new modification to the generalized Jensen-Shannon (JS) divergence, which is bounded. In addition, we propose a Geometric JS divergence-based loss, which is computationally efficient since it can be evaluated analytically. We found that the JS divergence-based variational inference is intractable, and hence employed a constrained optimization framework to formulate these losses. Our theoretical analysis and empirical experiments on multiple regression and classification data sets suggest that the proposed losses perform better than the KL divergence-based loss, especially when the data sets are noisy or biased. Specifically, there are approximately 5% and 8% improvements in accuracy for a noise-added CIFAR-10 dataset and a regression dataset, respectively. There is about a 13% reduction in false negative predictions of a biased histopathology dataset. In addition, we quantify and compare the uncertainty metrics for the regression and classification tasks.
MTRL-SCIJul 11, 2025
Surprisingly High Redundancy in Electronic Structure DataSazzad Hossain, Ponkrshnan Thiagarajan, Shashank Pathrudkar et al.
Machine Learning (ML) models for electronic structure rely on large datasets generated through expensive Kohn-Sham Density Functional Theory simulations. This study reveals a surprisingly high level of redundancy in such datasets across various material systems, including molecules, simple metals, and complex alloys. Our findings challenge the prevailing assumption that large, exhaustive datasets are necessary for accurate ML predictions of electronic structure. We demonstrate that even random pruning can substantially reduce dataset size with minimal loss in predictive accuracy, while a state-of-the-art coverage-based pruning strategy retains chemical accuracy and model generalizability using up to 100-fold less data and reducing training time by threefold or more. By contrast, widely used importance-based pruning methods, which eliminate seemingly redundant data, can catastrophically fail at higher pruning factors, possibly due to the significant reduction in data coverage. This heretofore unexplored high degree of redundancy in electronic structure data holds the potential to identify a minimal, essential dataset representative of each material class.
MLJul 19, 2025
Accelerating Hamiltonian Monte Carlo for Bayesian Inference in Neural Networks and Neural OperatorsPonkrshnan Thiagarajan, Tamer A. Zaki, Michael D. Shields
Hamiltonian Monte Carlo (HMC) is a powerful and accurate method to sample from the posterior distribution in Bayesian inference. However, HMC techniques are computationally demanding for Bayesian neural networks due to the high dimensionality of the network's parameter space and the non-convexity of their posterior distributions. Therefore, various approximation techniques, such as variational inference (VI) or stochastic gradient MCMC, are often employed to infer the posterior distribution of the network parameters. Such approximations introduce inaccuracies in the inferred distributions, resulting in unreliable uncertainty estimates. In this work, we propose a hybrid approach that combines inexpensive VI and accurate HMC methods to efficiently and accurately quantify uncertainties in neural networks and neural operators. The proposed approach leverages an initial VI training on the full network. We examine the influence of individual parameters on the prediction uncertainty, which shows that a large proportion of the parameters do not contribute substantially to uncertainty in the network predictions. This information is then used to significantly reduce the dimension of the parameter space, and HMC is performed only for the subset of network parameters that strongly influence prediction uncertainties. This yields a framework for accelerating the full batch HMC for posterior inference in neural networks. We demonstrate the efficiency and accuracy of the proposed framework on deep neural networks and operator networks, showing that inference can be performed for large networks with tens to hundreds of thousands of parameters. We show that this method can effectively learn surrogates for complex physical systems by modeling the operator that maps from upstream conditions to wall-pressure data on a cone in hypersonic flow.
CVOct 7, 2020
Explanation and Use of Uncertainty Quantified by Bayesian Neural Network Classifiers for Breast Histopathology ImagesPonkrshnan Thiagarajan, Pushkar Khairnar, Susanta Ghosh
Despite the promise of Convolutional neural network (CNN) based classification models for histopathological images, it is infeasible to quantify its uncertainties. Moreover, CNNs may suffer from overfitting when the data is biased. We show that Bayesian-CNN can overcome these limitations by regularizing automatically and by quantifying the uncertainty. We have developed a novel technique to utilize the uncertainties provided by the Bayesian-CNN that significantly improves the performance on a large fraction of the test data (about 6% improvement in accuracy on 77% of test data). Further, we provide a novel explanation for the uncertainty by projecting the data into a low dimensional space through a nonlinear dimensionality reduction technique. This dimensionality reduction enables interpretation of the test data through visualization and reveals the structure of the data in a low dimensional feature space. We show that the Bayesian-CNN can perform much better than the state-of-the-art transfer learning CNN (TL-CNN) by reducing the false negative and false positive by 11% and 7.7% respectively for the present data set. It achieves this performance with only 1.86 million parameters as compared to 134.33 million for TL-CNN. Besides, we modify the Bayesian-CNN by introducing a stochastic adaptive activation function. The modified Bayesian-CNN performs slightly better than Bayesian-CNN on all performance metrics and significantly reduces the number of false negatives and false positives (3% reduction for both). We also show that these results are statistically significant by performing McNemar's statistical significance test. This work shows the advantages of Bayesian-CNN against the state-of-the-art, explains and utilizes the uncertainties for histopathological images. It should find applications in various medical image classifications.