MLSep 28, 2023
A Primer on Bayesian Neural Networks: Review and DebatesJulyan Arbel, Konstantinos Pitas, Mariia Vladimirova et al.
Neural networks have achieved remarkable performance across various problem domains, but their widespread applicability is hindered by inherent limitations such as overconfidence in predictions, lack of interpretability, and vulnerability to adversarial attacks. To address these challenges, Bayesian neural networks (BNNs) have emerged as a compelling extension of conventional neural networks, integrating uncertainty estimation into their predictive capabilities. This comprehensive primer presents a systematic introduction to the fundamental concepts of neural networks and Bayesian inference, elucidating their synergistic integration for the development of BNNs. The target audience comprises statisticians with a potential background in Bayesian methods but lacking deep learning expertise, as well as machine learners proficient in deep neural networks but with limited exposure to Bayesian statistics. We provide an overview of commonly employed priors, examining their impact on model behavior and performance. Additionally, we delve into the practical considerations associated with training and inference in BNNs. Furthermore, we explore advanced topics within the realm of BNN research, acknowledging the existence of ongoing debates and controversies. By offering insights into cutting-edge developments, this primer not only equips researchers and practitioners with a solid foundation in BNNs, but also illuminates the potential applications of this dynamic field. As a valuable resource, it fosters an understanding of BNNs and their promising prospects, facilitating further advancements in the pursuit of knowledge and innovation.
LGJun 22, 2022
Cold Posteriors through PAC-BayesKonstantinos Pitas, Julyan Arbel
We investigate the cold posterior effect through the lens of PAC-Bayes generalization bounds. We argue that in the non-asymptotic setting, when the number of training samples is (relatively) small, discussions of the cold posterior effect should take into account that approximate Bayesian inference does not readily provide guarantees of performance on out-of-sample data. Instead, out-of-sample error is better described through a generalization bound. In this context, we explore the connections between the ELBO objective from variational inference and the PAC-Bayes objectives. We note that, while the ELBO and PAC-Bayes objectives are similar, the latter objectives naturally contain a temperature parameter $λ$ which is not restricted to be $λ=1$. For both regression and classification tasks, in the case of isotropic Laplace approximations to the posterior, we show how this PAC-Bayesian interpretation of the temperature parameter captures the cold posterior effect.
LGSep 11, 2023
The fine print on tempered posteriorsKonstantinos Pitas, Julyan Arbel
We conduct a detailed investigation of tempered posteriors and uncover a number of crucial and previously undiscussed points. Contrary to previous results, we first show that for realistic models and datasets and the tightly controlled case of the Laplace approximation to the posterior, stochasticity does not in general improve test accuracy. The coldest temperature is often optimal. One might think that Bayesian models with some stochasticity can at least obtain improvements in terms of calibration. However, we show empirically that when gains are obtained this comes at the cost of degradation in test accuracy. We then discuss how targeting Frequentist metrics using Bayesian models provides a simple explanation of the need for a temperature parameter $λ$ in the optimization objective. Contrary to prior works, we finally show through a PAC-Bayesian analysis that the temperature $λ$ cannot be seen as simply fixing a misspecified prior or likelihood.
CLMay 22, 2024Code
Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queriesAdam Yang, Chen Chen, Konstantinos Pitas
State-of-the-art large language models are sometimes distributed as open-source software but are also increasingly provided as a closed-source service. These closed-source large-language models typically see the widest usage by the public, however, they often do not provide an estimate of their uncertainty when responding to queries. As even the best models are prone to ``hallucinating" false information with high confidence, a lack of a reliable estimate of uncertainty limits the applicability of these models in critical settings. We explore estimating the uncertainty of closed-source LLMs via multiple rephrasings of an original base query. Specifically, we ask the model, multiple rephrased questions, and use the similarity of the answers as an estimate of uncertainty. We diverge from previous work in i) providing rules for rephrasing that are simple to memorize and use in practice ii) proposing a theoretical framework for why multiple rephrased queries obtain calibrated uncertainty estimates. Our method demonstrates significant improvements in the calibration of uncertainty estimates compared to the baseline and provides intuition as to how query strategies should be designed for optimal test calibration.
LGOct 4, 2023
Something for (almost) nothing: Improving deep ensemble calibration using unlabeled dataKonstantinos Pitas, Julyan Arbel
We present a method to improve the calibration of deep ensembles in the small training data regime in the presence of unlabeled data. Our approach is extremely simple to implement: given an unlabeled set, for each unlabeled data point, we simply fit a different randomly selected label with each ensemble member. We provide a theoretical analysis based on a PAC-Bayes bound which guarantees that if we fit such a labeling on unlabeled data, and the true labels on the training data, we obtain low negative log-likelihood and high ensemble diversity on testing samples. Empirically, through detailed experiments, we find that for low to moderately-sized training sets, our ensembles are more diverse and provide better calibration than standard ensembles, sometimes significantly.
LGJul 19, 2023
Europepolls: A Dataset of Country-Level Opinion Polling Data for the European Union and the UKKonstantinos Pitas
I propose an open dataset of country-level historical opinion polling data for the European Union and the UK. The dataset aims to fill a gap in available opinion polling data for the European Union. Some existing datasets are restricted to the past five years, limiting research opportunities. At the same time, some larger proprietary datasets exist but are available only in a visual preprocessed time series format. Finally, while other large datasets for individual countries might exist, these could be inaccessible due to language barriers. The data was gathered from Wikipedia, and preprocessed using the pandas library. Both the raw and the preprocessed data are in the .csv format. I hope that given the recent advances in LLMs and deep learning in general, this large dataset will enable researchers to uncover complex interactions between multimodal data (news articles, economic indicators, social media) and voting behavior. The raw data, the preprocessed data, and the preprocessing scripts are available on GitHub.
CVMay 22, 2024
Just rotate it! Uncertainty estimation in closed-source models via multiple queriesKonstantinos Pitas, Julyan Arbel
We propose a simple and effective method to estimate the uncertainty of closed-source deep neural network image classification models. Given a base image, our method creates multiple transformed versions and uses them to query the top-1 prediction of the closed-source model. We demonstrate significant improvements in the calibration of uncertainty estimates compared to the naive baseline of assigning 100\% confidence to all predictions. While we initially explore Gaussian perturbations, our empirical findings indicate that natural transformations, such as rotations and elastic deformations, yield even better-calibrated predictions. Furthermore, through empirical results and a straightforward theoretical analysis, we elucidate the reasons behind the superior performance of natural transformations over Gaussian noise. Leveraging these insights, we propose a transfer learning approach that further improves our calibration results.
LGSep 6, 2019
Dissecting Non-Vacuous Generalization Bounds based on the Mean-Field ApproximationKonstantinos Pitas
Explaining how overparametrized neural networks simultaneously achieve low risk and zero empirical risk on benchmark datasets is an open problem. PAC-Bayes bounds optimized using variational inference (VI) have been recently proposed as a promising direction in obtaining non-vacuous bounds. We show empirically that this approach gives negligible gains when modeling the posterior as a Gaussian with diagonal covariance--known as the mean-field approximation. We investigate common explanations, such as the failure of VI due to problems in optimization or choosing a suboptimal prior. Our results suggest that investigating richer posteriors is the most promising direction forward.
LGMay 23, 2019
The role of invariance in spectral complexity-based generalization boundsKonstantinos Pitas, Andreas Loukas, Mike Davies et al.
Deep convolutional neural networks (CNNs) have been shown to be able to fit a random labeling over data while still being able to generalize well for normal labels. Describing CNN capacity through a posteriori measures of complexity has been recently proposed to tackle this apparent paradox. These complexity measures are usually validated by showing that they correlate empirically with GE; being empirically larger for networks trained on random vs normal labels. Focusing on the case of spectral complexity we investigate theoretically and empirically the insensitivity of the complexity measure to invariances relevant to CNNs, and show several limitations of spectral complexity that occur as a result. For a specific formulation of spectral complexity we show that it results in the same upper bound complexity estimates for convolutional and locally connected architectures (which don't have the same favorable invariance properties). This is contrary to common intuition and empirical results.
LGMay 21, 2019
Revisiting hard thresholding for DNN pruningKonstantinos Pitas, Mike Davies, Pierre Vandergheynst
The most common method for DNN pruning is hard thresholding of network weights, followed by retraining to recover any lost accuracy. Recently developed smart pruning algorithms use the DNN response over the training set for a variety of cost functions to determine redundant network weights, leading to less accuracy degradation and possibly less retraining time. For experiments on the total pruning time (pruning time + retraining time) we show that hard thresholding followed by retraining remains the most efficient way of reducing the number of network parameters. However smart pruning algorithms still have advantages when retraining is not possible. In this context we propose a novel smart pruning algorithm based on difference of convex functions optimisation and show that it is often orders of magnitude faster than competing approaches while achieving the lowest classification accuracy degradation. Furthermore we investigate theoretically the effect of hard thresholding on DNN accuracy. We show that accuracy degradation increases with remaining network depth from the pruned layer. We also discover a link between the latent dimensionality of the training data manifold and network robustness to hard thresholding.
LGMar 12, 2018
FeTa: A DCA Pruning Algorithm with Generalization Error GuaranteesKonstantinos Pitas, Mike Davies, Pierre Vandergheynst
Recent DNN pruning algorithms have succeeded in reducing the number of parameters in fully connected layers, often with little or no drop in classification accuracy. However, most of the existing pruning schemes either have to be applied during training or require a costly retraining procedure after pruning to regain classification accuracy. We start by proposing a cheap pruning algorithm for fully connected DNN layers based on difference of convex functions (DC) optimisation, that requires little or no retraining. We then provide a theoretical analysis for the growth in the Generalization Error (GE) of a DNN for the case of bounded perturbations to the hidden layers, of which weight pruning is a special case. Our pruning method is orders of magnitude faster than competing approaches, while our theoretical analysis sheds light to previously observed problems in DNN pruning. Experiments on commnon feedforward neural networks validate our results.
LGDec 30, 2017
PAC-Bayesian Margin Bounds for Convolutional Neural NetworksKonstantinos Pitas, Mike Davies, Pierre Vandergheynst
Recently the generalization error of deep neural networks has been analyzed through the PAC-Bayesian framework, for the case of fully connected layers. We adapt this approach to the convolutional setting.