Heiko Strathmann

ML
h-index28
18papers
1,588citations
Novelty61%
AI Score33

18 Papers

CVJan 13, 2023
Laser: Latent Set Representations for 3D Generative Modeling

Pol Moreno, Adam R. Kosiorek, Heiko Strathmann et al. · deepmind

NeRF provides unparalleled fidelity of novel view synthesis: rendering a 3D scene from an arbitrary viewpoint. NeRF requires training on a large number of views that fully cover a scene, which limits its applicability. While these issues can be addressed by learning a prior over scenes in various forms, previous approaches have been either applied to overly simple scenes or struggling to render unobserved parts. We introduce Laser-NV: a generative model which achieves high modelling capacity, and which is based on a set-valued latent representation modelled by normalizing flows. Similarly to previous amortized approaches, Laser-NV learns structure from multiple scenes and is capable of fast, feed-forward inference from few views. To encourage higher rendering fidelity and consistency with observed views, Laser-NV further incorporates a geometry-informed attention mechanism over the observed views. Laser-NV further produces diverse and plausible completions of occluded parts of a scene while remaining consistent with observations. Laser-NV shows state-of-the-art novel-view synthesis quality when evaluated on ShapeNet and on a novel simulated City dataset, which features high uncertainty in the unobserved regions of the scene.

MLAug 16, 2022
Score-Based Diffusion meets Annealed Importance Sampling

Arnaud Doucet, Will Grathwohl, Alexander G. D. G. Matthews et al.

More than twenty years after its introduction, Annealed Importance Sampling (AIS) remains one of the most effective methods for marginal likelihood estimation. It relies on a sequence of distributions interpolating between a tractable initial distribution and the target distribution of interest which we simulate from approximately using a non-homogeneous Markov chain. To obtain an importance sampling estimate of the marginal likelihood, AIS introduces an extended target distribution to reweight the Markov chain proposal. While much effort has been devoted to improving the proposal distribution used by AIS, an underappreciated issue is that AIS uses a convenient but suboptimal extended target distribution. We here leverage recent progress in score-based generative modeling (SGM) to approximate the optimal extended target distribution minimizing the variance of the marginal likelihood estimate for AIS proposals corresponding to the discretization of Langevin and Hamiltonian dynamics. We demonstrate these novel, differentiable, AIS procedures on a number of synthetic benchmark distributions and variational auto-encoders.

MLJul 19, 2013Code
Kernel Adaptive Metropolis-Hastings

Dino Sejdinovic, Heiko Strathmann, Maria Lomeli Garcia et al.

A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support. The algorithm embeds the trajectory of the Markov chain into a reproducing kernel Hilbert space (RKHS), such that the feature space covariance of the samples informs the choice of proposal. The procedure is computationally efficient and straightforward to implement, since the RKHS moves can be integrated out analytically: our proposal distribution in the original space is a normal distribution whose mean and covariance depend on where the current sample lies in the support of the target distribution, and adapts to its local covariance structure. Furthermore, the procedure requires neither gradients nor any other higher order information about the target, making it particularly attractive for contexts such as Pseudo-Marginal MCMC. Kernel Adaptive Metropolis-Hastings outperforms competing fixed and adaptive samplers on multivariate, highly nonlinear target distributions, arising in both real-world and synthetic examples. Code may be downloaded at https://github.com/karlnapf/kameleon-mcmc.

ROMar 13, 2024
Scaling Instructable Agents Across Many Simulated Worlds

SIMA Team, Maria Abi Raad, Arun Ahuja et al. · deepmind, stanford

Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructions across a diverse range of virtual 3D environments, including curated research environments as well as open-ended, commercial video games. Our goal is to develop an instructable agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface: the inputs are image observations and language instructions and the outputs are keyboard-and-mouse actions. This general approach is challenging, but it allows agents to ground language across many visually complex and semantically rich environments while also allowing us to readily run agents in new environments. In this paper we describe our motivation and goal, the initial progress we have made, and promising preliminary results on several diverse research environments and a variety of commercial video games.

LGJul 22, 2021
Neural Variational Gradient Descent

Lauro Langosco di Langosco, Vincent Fortuin, Heiko Strathmann

Particle-based approximate Bayesian inference approaches such as Stein Variational Gradient Descent (SVGD) combine the flexibility and convergence guarantees of sampling methods with the computational benefits of variational inference. In practice, SVGD relies on the choice of an appropriate kernel function, which impacts its ability to model the target distribution -- a challenging problem with only heuristic solutions. We propose Neural Variational Gradient Descent (NVGD), which is based on parameterizing the witness function of the Stein discrepancy by a deep neural network whose parameters are learned in parallel to the inference, mitigating the necessity to make any kernel choices whatsoever. We empirically evaluate our method on popular synthetic inference problems, real-world Bayesian linear regression, and Bayesian neural network inference.

MLApr 1, 2021
NeRF-VAE: A Geometry Aware 3D Scene Generative Model

Adam R. Kosiorek, Heiko Strathmann, Daniel Zoran et al.

We propose NeRF-VAE, a 3D scene generative model that incorporates geometric structure via NeRF and differentiable volume rendering. In contrast to NeRF, our model takes into account shared structure across scenes, and is able to infer the structure of a novel scene -- without the need to re-train -- using amortized inference. NeRF-VAE's explicit 3D rendering process further contrasts previous generative models with convolution-based rendering which lacks geometric structure. Our model is a VAE that learns a distribution over radiance fields by conditioning them on a latent scene representation. We show that, once trained, NeRF-VAE is able to infer and render geometrically-consistent scenes from previously unseen 3D environments using very few input images. We further demonstrate that NeRF-VAE generalizes well to out-of-distribution cameras, while convolutional models do not. Finally, we introduce and study an attention-based conditioning mechanism of NeRF-VAE's decoder, which improves model performance.

LGMar 1, 2021
Persistent Message Passing

Heiko Strathmann, Mohammadamin Barekatain, Charles Blundell et al.

Graph neural networks (GNNs) are a powerful inductive bias for modelling algorithmic reasoning procedures and data structures. Their prowess was mainly demonstrated on tasks featuring Markovian dynamics, where querying any associated data structure depends only on its latest state. For many tasks of interest, however, it may be highly beneficial to support efficient data structure queries dependent on previous states. This requires tracking the data structure's evolution through time, placing significant pressure on the GNN's latent representations. We introduce Persistent Message Passing (PMP), a mechanism which endows GNNs with capability of querying past state by explicitly persisting it: rather than overwriting node representations, it creates new nodes whenever required. PMP generalises out-of-distribution to more than 2x larger test inputs on dynamic temporal range queries, significantly outperforming GNNs which overwrite states.

MLJan 23, 2019
Meta-Learning Mean Functions for Gaussian Processes

Vincent Fortuin, Heiko Strathmann, Gunnar Rätsch

When fitting Bayesian machine learning models on scarce data, the main challenge is to obtain suitable prior knowledge and encode it into the model. Recent advances in meta-learning offer powerful methods for extracting such prior knowledge from data acquired in related tasks. When it comes to meta-learning in Gaussian process models, approaches in this setting have mostly focused on learning the kernel function of the prior, but not on learning its mean function. In this work, we explore meta-learning the mean function of a Gaussian process prior. We present analytical and empirical evidence that mean function learning can be useful in the meta-learning setting, discuss the risk of overfitting, and draw connections to other meta-learning approaches, such as model agnostic meta-learning and functional PCA.

MLNov 20, 2018
Learning deep kernels for exponential family densities

Li Wenliang, Danica J. Sutherland, Heiko Strathmann et al.

The kernel exponential family is a rich class of distributions, which can be fit efficiently and with statistical guarantees by score matching. Being required to choose a priori a simple kernel such as the Gaussian, however, limits its practical applicability. We provide a scheme for learning a kernel parameterized by a deep network, which can find complex location-dependent local features of the data geometry. This gives a very rich class of density models, capable of fitting complex structures on moderate-dimensional problems. Compared to deep density models fit via maximum likelihood, our approach provides a complementary set of strengths and tradeoffs: in empirical studies, the former can yield higher likelihoods, whereas the latter gives better estimates of the gradient of the log density, the score, which describes the distribution's shape.

MLOct 24, 2018
Scalable Gaussian Processes on Discrete Domains

Vincent Fortuin, Gideon Dresdner, Heiko Strathmann et al.

Kernel methods on discrete domains have shown great promise for many challenging data types, for instance, biological sequence data and molecular structure data. Scalable kernel methods like Support Vector Machines may offer good predictive performances but do not intrinsically provide uncertainty estimates. In contrast, probabilistic kernel methods like Gaussian Processes offer uncertainty estimates in addition to good predictive performance but fall short in terms of scalability. While the scalability of Gaussian processes can be improved using sparse inducing point approximations, the selection of these inducing points remains challenging. We explore different techniques for selecting inducing points on discrete domains, including greedy selection, determinantal point processes, and simulated annealing. We find that simulated annealing, which can select inducing points that are not in the training set, can perform competitively with support vector machines and full Gaussian processes on synthetic data, as well as on challenging real-world DNA sequence data.

LGJun 6, 2018
SOM-VAE: Interpretable Discrete Representation Learning on Time Series

Vincent Fortuin, Matthias Hüser, Francesco Locatello et al.

High-dimensional time series are common in many domains. Since human cognition is not optimized to work well in high-dimensional spaces, these areas could benefit from interpretable low-dimensional representations. However, most representation learning algorithms for time series data are difficult to interpret. This is due to non-intuitive mappings from data features to salient properties of the representation and non-smoothness over time. To address this problem, we propose a new representation learning framework building on ideas from interpretable discrete dimensionality reduction and deep generative modeling. This framework allows us to learn discrete representations of time series, which give rise to smooth and interpretable embeddings with superior clustering performance. We introduce a new way to overcome the non-differentiability in discrete representation learning and present a gradient-based version of the traditional self-organizing map algorithm that is more performant than the original. Furthermore, to allow for a probabilistic interpretation of our method, we integrate a Markov model in the representation space. This model uncovers the temporal transition structure, improves clustering performance even further and provides additional explanatory insights as well as a natural representation of uncertainty. We evaluate our model in terms of clustering performance and interpretability on static (Fashion-)MNIST data, a time series of linearly interpolated (Fashion-)MNIST images, a chaotic Lorenz attractor system with two macro states, as well as on a challenging real world medical time series application on the eICU data set. Our learned representations compare favorably with competitor methods and facilitate downstream tasks on the real world data.

MLMay 23, 2017
Efficient and principled score estimation with Nyström kernel exponential families

Danica J. Sutherland, Heiko Strathmann, Michael Arbel et al.

We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite-dimensional. The model is learned by fitting the derivative of the log density, the score, thus avoiding the need to compute a normalization constant. Our approach improves the computational efficiency of an earlier solution by using a low-rank, Nyström-like solution. The new solution retains the consistency and convergence rates of the full-rank solution (exactly in Fisher distance, and nearly in other distances), with guarantees on the degree of cost and storage reduction. We evaluate the method in experiments on density estimation and in the construction of an adaptive Hamiltonian Monte Carlo sampler. Compared to an existing score learning approach using a denoising autoencoder, our estimator is empirically more data-efficient when estimating the score, runs faster, and has fewer parameters (which can be tuned in a principled and interpretable way), in addition to providing statistical guarantees.

MLNov 14, 2016
Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy

Danica J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann et al.

We propose a method to optimize the representation and distinguishability of samples from two probability distributions, by maximizing the estimated power of a statistical test based on the maximum mean discrepancy (MMD). This optimized MMD is applied to the setting of unsupervised learning by generative adversarial networks (GAN), in which a model attempts to generate realistic samples, and a discriminator attempts to tell these apart from data samples. In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples. Second, the MMD can be used to evaluate the performance of a generative model, by testing the model's samples against a reference data set. In the latter role, the optimized MMD is particularly helpful, as it gives an interpretable indication of how the model and data distributions differ, even in cases where individual model samples are not easily distinguished either by eye or by classifier.

MLFeb 9, 2016
A Kernel Test of Goodness of Fit

Kacper Chwialkowski, Heiko Strathmann, Arthur Gretton

We propose a nonparametric statistical test for goodness-of-fit: given a set of samples, the test determines how likely it is that these were generated from a target density function. The measure of goodness-of-fit is a divergence constructed via Stein's method using functions from a Reproducing Kernel Hilbert Space. Our test statistic is based on an empirical estimate of this divergence, taking the form of a V-statistic in terms of the log gradients of the target density and the kernel. We derive a statistical test, both for i.i.d. and non-i.i.d. samples, where we estimate the null distribution quantiles using a wild bootstrap procedure. We apply our test to quantifying convergence of approximate Markov Chain Monte Carlo methods, statistical model criticism, and evaluating quality of fit vs model complexity in nonparametric density estimation.

COOct 11, 2015
Kernel Sequential Monte Carlo

Ingmar Schuster, Heiko Strathmann, Brooks Paige et al.

We propose kernel sequential Monte Carlo (KSMC), a framework for sampling from static target densities. KSMC is a family of sequential Monte Carlo algorithms that are based on building emulator models of the current particle system in a reproducing kernel Hilbert space. We here focus on modelling nonlinear covariance structure and gradients of the target. The emulator's geometry is adaptively updated and subsequently used to inform local proposals. Unlike in adaptive Markov chain Monte Carlo, continuous adaptation does not compromise convergence of the sampler. KSMC combines the strengths of sequental Monte Carlo and kernel methods: superior performance for multimodal targets and the ability to estimate model evidence as compared to Markov chain Monte Carlo, and the emulator's ability to represent targets that exhibit high degrees of nonlinearity. As KSMC does not require access to target gradients, it is particularly applicable on targets whose gradients are unknown or prohibitively expensive. We describe necessary tuning details and demonstrate the benefits of the the proposed methodology on a series of challenging synthetic and real-world examples.

MLJun 8, 2015
Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families

Heiko Strathmann, Dino Sejdinovic, Samuel Livingstone et al.

We propose Kernel Hamiltonian Monte Carlo (KMC), a gradient-free adaptive MCMC algorithm based on Hamiltonian Monte Carlo (HMC). On target densities where classical HMC is not an option due to intractable gradients, KMC adaptively learns the target's gradient structure by fitting an exponential family model in a Reproducing Kernel Hilbert Space. Computational costs are reduced by two novel efficient approximations to this gradient. While being asymptotically exact, KMC mimics HMC in terms of sampling efficiency, and offers substantial mixing improvements over state-of-the-art gradient free samplers. We support our claims with experimental studies on both toy and real-world applications, including Approximate Bayesian Computation and exact-approximate MCMC.

MLJan 14, 2015
Unbiased Bayes for Big Data: Paths of Partial Posteriors

Heiko Strathmann, Dino Sejdinovic, Mark Girolami

A key quantity of interest in Bayesian inference are expectations of functions with respect to a posterior distribution. Markov Chain Monte Carlo is a fundamental tool to consistently compute these expectations via averaging samples drawn from an approximate posterior. However, its feasibility is being challenged in the era of so called Big Data as all data needs to be processed in every iteration. Realising that such simulation is an unnecessarily hard problem if the goal is estimation, we construct a computationally scalable methodology that allows unbiased estimation of the required expectations -- without explicit simulation from the full posterior. The scheme's variance is finite by construction and straightforward to control, leading to algorithms that are provably unbiased and naturally arrive at a desired error tolerance. This is achieved at an average computational complexity that is sub-linear in the size of the dataset and its free parameters are easy to tune. We demonstrate the utility and generality of the methodology on a range of common statistical models applied to large-scale benchmark and real-world datasets.

MEJun 17, 2013
On Russian Roulette Estimates for Bayesian Inference with Doubly-Intractable Likelihoods

Anne-Marie Lyne, Mark Girolami, Yves Atchadé et al.

A large number of statistical models are "doubly-intractable": the likelihood normalising term, which is a function of the model parameters, is intractable, as well as the marginal likelihood (model evidence). This means that standard inference techniques to sample from the posterior, such as Markov chain Monte Carlo (MCMC), cannot be used. Examples include, but are not confined to, massive Gaussian Markov random fields, autologistic models and Exponential random graph models. A number of approximate schemes based on MCMC techniques, Approximate Bayesian computation (ABC) or analytic approximations to the posterior have been suggested, and these are reviewed here. Exact MCMC schemes, which can be applied to a subset of doubly-intractable distributions, have also been developed and are described in this paper. As yet, no general method exists which can be applied to all classes of models with doubly-intractable posteriors. In addition, taking inspiration from the Physics literature, we study an alternative method based on representing the intractable likelihood as an infinite series. Unbiased estimates of the likelihood can then be obtained by finite time stochastic truncation of the series via Russian Roulette sampling, although the estimates are not necessarily positive. Results from the Quantum Chromodynamics literature are exploited to allow the use of possibly negative estimates in a pseudo-marginal MCMC scheme such that expectations with respect to the posterior distribution are preserved. The methodology is reviewed on well-known examples such as the parameters in Ising models, the posterior for Fisher-Bingham distributions on the $d$-Sphere and a large-scale Gaussian Markov Random Field model describing the Ozone Column data. This leads to a critical assessment of the strengths and weaknesses of the methodology with pointers to ongoing research.