MLApr 9, 2023
PriorCVAE: scalable MCMC parameter inference with Bayesian deep generative modellingElizaveta Semenova, Prakhar Verma, Max Cairney-Leeming et al. · oxford
Recent advances have shown that GP priors, or their finite realisations, can be encoded using deep generative models such as variational autoencoders (VAEs). These learned generators can serve as drop-in replacements for the original priors during MCMC inference. While this approach enables efficient inference, it loses information about the hyperparameters of the original models, and consequently makes inference over hyperparameters impossible and the learned priors indistinct. To overcome this limitation, we condition the VAE on stochastic process hyperparameters. This allows the joint encoding of hyperparameters with GP realizations and their subsequent estimation during inference. Further, we demonstrate that our proposed method, PriorCVAE, is agnostic to the nature of the models which it approximates, and can be used, for instance, to encode solutions of ODEs. It provides a practical tool for approximate inference and shows potential in real-life spatial and spatiotemporal applications.
LGMar 18
Demystifying amortized causal discovery with transformersFrancesco Montagna, Max Cairney-Leeming, Dhanya Sridhar et al.
Supervised learning for causal discovery from observational data often achieves competitive performance despite seemingly avoiding the explicit assumptions that traditional methods require for identifiability. In this work, we analyze CSIvA (Ke et al., 2023) on bivariate causal models, a transformer architecture for amortized inference promising to train on synthetic data and transfer to real ones. First, we bridge the gap with identifiability theory, showing that the training distribution implicitly defines a prior on the causal model of the test observations: consistent with classical approaches, good performance is achieved when we have a good prior on the test data, and the underlying model is identifiable. Second, we find that CSIvA can not generalize to classes of causal models unseen during training: to overcome this limitation, we theoretically and empirically analyze \textit{when} training CSIvA on datasets generated by multiple identifiable causal models with different structural assumptions improves its generalization at test time. Overall, we find that amortized causal discovery with transformers still adheres to identifiability theory, violating the previous hypothesis from Lopez-Paz et al. (2015) that supervised learning methods could overcome its restrictions.
LGFeb 11
Adaptive Sampling for Private Worst-Case Group OptimizationMax Cairney-Leeming, Amartya Sanyal, Christoph H. Lampert
Models trained by minimizing the average loss often fail to be accurate on small or hard-to-learn groups of the data. Various methods address this issue by optimizing a weighted objective that focuses on the worst-performing groups. However, this approach becomes problematic when learning with differential privacy, as unequal data weighting can result in inhomogeneous privacy guarantees, in particular weaker privacy for minority groups. In this work, we introduce a new algorithm for differentially private worst-case group optimization called ASC (Adaptively Sampled and Clipped Worst-case Group Optimization). It adaptively controls both the sampling rate and the clipping threshold of each group. Thereby, it allows for harder-to-learn groups to be sampled more often while ensuring consistent privacy guarantees across all groups. Comparing ASC to prior work, we show that it results in lower-variance gradients, tighter privacy guarantees, and substantially higher worst-case group accuracy without sacrificing overall average accuracy.