LGDec 6, 2022
PRISM: Probabilistic Real-Time Inference in Spatial World ModelsAtanas Mirchev, Baris Kayalibay, Ahmed Agha et al.
We introduce PRISM, a method for real-time filtering in a probabilistic generative model of agent motion and visual perception. Previous approaches either lack uncertainty estimates for the map and agent state, do not run in real-time, do not have a dense scene representation or do not model agent dynamics. Our solution reconciles all of these aspects. We start from a predefined state-space model which combines differentiable rendering and 6-DoF dynamics. Probabilistic inference in this model amounts to simultaneous localisation and mapping (SLAM) and is intractable. We use a series of approximations to Bayesian inference to arrive at probabilistic map and state estimates. We take advantage of well-established methods and closed-form updates, preserving accuracy and enabling real-time capability. The proposed solution runs at 10Hz real-time and is similarly accurate to state-of-the-art SLAM in small to medium-sized indoor environments, with high-speed UAV and handheld camera agents (Blackbird, EuRoC and TUM-RGBD).
LGApr 20, 2023
Filter-Aware Model-Predictive ControlBaris Kayalibay, Atanas Mirchev, Ahmed Agha et al.
Partially-observable problems pose a trade-off between reducing costs and gathering information. They can be solved optimally by planning in belief space, but that is often prohibitively expensive. Model-predictive control (MPC) takes the alternative approach of using a state estimator to form a belief over the state, and then plan in state space. This ignores potential future observations during planning and, as a result, cannot actively increase or preserve the certainty of its own state estimate. We find a middle-ground between planning in belief space and completely ignoring its dynamics by only reasoning about its future accuracy. Our approach, filter-aware MPC, penalises the loss of information by what we call "trackability", the expected error of the state estimator. We show that model-based simulation allows condensing trackability into a neural network, which allows fast planning. In experiments involving visual navigation, realistic every-day environments and a two-link robot arm, we show that filter-aware MPC vastly improves regular MPC.
LGJan 25, 2022
Tracking and Planning with Spatial World ModelsBaris Kayalibay, Atanas Mirchev, Patrick van der Smagt et al.
We introduce a method for real-time navigation and tracking with differentiably rendered world models. Learning models for control has led to impressive results in robotics and computer games, but this success has yet to be extended to vision-based navigation. To address this, we transfer advances in the emergent field of differentiable rendering to model-based control. We do this by planning in a learned 3D spatial world model, combined with a pose estimation algorithm previously used in the context of TSDF fusion, but now tailored to our setting and improved to incorporate agent dynamics. We evaluate over six simulated environments based on complex human-designed floor plans and provide quantitative results. We achieve up to 92% navigation success rate at a frequency of 15 Hz using only image and depth observations under stochastic, continuous dynamics.
LGJan 18, 2021
Mind the Gap when Conditioning Amortised Inference in Sequential Latent-Variable ModelsJustin Bayer, Maximilian Soelch, Atanas Mirchev et al.
Amortised inference enables scalable learning of sequential latent-variable models (LVMs) with the evidence lower bound (ELBO). In this setting, variational posteriors are often only partially conditioned. While the true posteriors depend, e.g., on the entire sequence of observations, approximate posteriors are only informed by past observations. This mimics the Bayesian filter -- a mixture of smoothing posteriors. Yet, we show that the ELBO objective forces partially-conditioned amortised posteriors to approximate products of smoothing posteriors instead. Consequently, the learned generative model is compromised. We demonstrate these theoretical findings in three scenarios: traffic flow, handwritten digits, and aerial vehicle dynamics. Using fully-conditioned approximate posteriors, performance improves in terms of generative modelling and multi-step prediction.
MLJun 17, 2020
Variational State-Space Models for Localisation and Dense 3D Mapping in 6 DoFAtanas Mirchev, Baris Kayalibay, Patrick van der Smagt et al.
We solve the problem of 6-DoF localisation and 3D dense reconstruction in spatial environments as approximate Bayesian inference in a deep state-space model. Our approach leverages both learning and domain knowledge from multiple-view geometry and rigid-body dynamics. This results in an expressive predictive model of the world, often missing in current state-of-the-art visual SLAM solutions. The combination of variational inference, neural networks and a differentiable raycaster ensures that our model is amenable to end-to-end gradient-based optimisation. We evaluate our approach on realistic unmanned aerial vehicle flight data, nearing the performance of state-of-the-art visual-inertial odometry systems. We demonstrate the applicability of the model to generative prediction and planning.
MLFeb 12, 2020
Learning Flat Latent Manifolds with VAEsNutan Chen, Alexej Klushyn, Francesco Ferroni et al.
Measuring the similarity between data points often requires domain knowledge, which can in parts be compensated by relying on unsupervised methods such as latent-variable models, where similarity/distance is estimated in a more compact latent space. Prevalent is the use of the Euclidean metric, which has the drawback of ignoring information about similarity of data stored in the decoder, as captured by the framework of Riemannian geometry. We propose an extension to the framework of variational auto-encoders allows learning flat latent manifolds, where the Euclidean metric is a proxy for the similarity between data points. This is achieved by defining the latent space as a Riemannian manifold and by regularising the metric tensor to be a scaled identity matrix. Additionally, we replace the compact prior typically used in variational auto-encoders with a recently presented, more expressive hierarchical one---and formulate the learning problem as a constrained optimisation problem. We evaluate our method on a range of data-sets, including a video-tracking benchmark, where the performance of our unsupervised approach nears that of state-of-the-art supervised approaches, while retaining the computational efficiency of straight-line-based approaches.
MLOct 14, 2019
Variational Tracking and Prediction with Generative Disentangled State-Space ModelsAdnan Akhundov, Maximilian Soelch, Justin Bayer et al.
We address tracking and prediction of multiple moving objects in visual data streams as inference and sampling in a disentangled latent state-space model. By encoding objects separately and including explicit position information in the latent state space, we perform tracking via amortized variational Bayesian inference of the respective latent positions. Inference is implemented in a modular neural framework tailored towards our disentangled latent space. Generative and inference model are jointly learned from observations only. Comparing to related prior work, we empirically show that our Markovian state-space assumption enables faithful and much improved long-term prediction well beyond the training horizon. Further, our inference model correctly decomposes frames into objects, even in the presence of occlusions. Tracking performance is increased significantly over prior art.
MLAug 23, 2019
Increasing the Generalisation Capacity of Conditional VAEsAlexej Klushyn, Nutan Chen, Botond Cseke et al.
We address the problem of one-to-many mappings in supervised learning, where a single instance has many different solutions of possibly equal cost. The framework of conditional variational autoencoders describes a class of methods to tackle such structured-prediction tasks by means of latent variables. We propose to incentivise informative latent representations for increasing the generalisation capacity of conditional variational autoencoders. To this end, we modify the latent variable model by defining the likelihood as a function of the latent variable only and introduce an expressive multimodal prior to enable the model for capturing semantically meaningful features of the data. To validate our approach, we train our model on the Cornell Robot Grasping dataset, and modified versions of MNIST and Fashion-MNIST obtaining results that show a significantly higher generalisation capability.
MLMar 18, 2019
On Deep Set Learning and the Choice of AggregationsMaximilian Soelch, Adnan Akhundov, Patrick van der Smagt et al.
Recently, it has been shown that many functions on sets can be represented by sum decompositions. These decompositons easily lend themselves to neural approximations, extending the applicability of neural nets to set-valued inputs---Deep Set learning. This work investigates a core component of Deep Set architecture: aggregation functions. We suggest and examine alternatives to commonly used aggregation functions, including learnable recurrent aggregation functions. Empirically, we show that the Deep Set networks are highly sensitive to the choice of aggregation functions: beyond improved performance, we find that learnable aggregations lower hyper-parameter sensitivity and generalize better to out-of-distribution input size.
MLJan 14, 2019
Bayesian Learning of Neural Network ArchitecturesGeorgi Dikov, Patrick van der Smagt, Justin Bayer
In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks with a learnt structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The proposed method relies on standard neural variational learning and, unlike randomised architecture search, does not require a retraining of the model, thus keeping the computational overhead at minimum.
MLDec 19, 2018
Fast Approximate Geodesics for Deep Generative ModelsNutan Chen, Francesco Ferroni, Alexej Klushyn et al.
The length of the geodesic between two data points along a Riemannian manifold, induced by a deep generative model, yields a principled measure of similarity. Current approaches are limited to low-dimensional latent spaces, due to the computational complexity of solving a non-convex optimisation problem. We propose finding shortest paths in a finite graph of samples from the aggregate approximate posterior, that can be solved exactly, at greatly reduced runtime, and without a notable loss in quality. Our approach, therefore, is hence applicable to high-dimensional problems, e.g., in the visual domain. We validate our approach empirically on a series of experiments using variational autoencoders applied to image data, including the Chair, FashionMNIST, and human movement data sets.
MLMay 18, 2018
Approximate Bayesian inference in spatial environmentsAtanas Mirchev, Baris Kayalibay, Maximilian Soelch et al.
Model-based approaches bear great promise for decision making of agents interacting with the physical world. In the context of spatial environments, different types of problems such as localisation, mapping, navigation or autonomous exploration are typically adressed with specialised methods, often relying on detailed knowledge of the system at hand. We express these tasks as probabilistic inference and planning under the umbrella of deep sequential generative models. Using the frameworks of variational inference and neural networks, our method inherits favourable properties such as flexibility, scalability and the ability to learn from data. The method performs comparably to specialised state-of-the-art methodology in two distinct simulated environments.
MLNov 3, 2017
Metrics for Deep Generative ModelsNutan Chen, Alexej Klushyn, Richard Kurle et al.
Neural samplers such as variational autoencoders (VAEs) or generative adversarial networks (GANs) approximate distributions by transforming samples from a simple random source---the latent space---to samples from a more complex distribution represented by a dataset. While the manifold hypothesis implies that the density induced by a dataset contains large regions of low density, the training criterions of VAEs and GANs will make the latent space densely covered. Consequently points that are separated by low-density regions in observation space will be pushed together in latent space, making stationary distances poor proxies for similarity. We transfer ideas from Riemannian geometry to this setting, letting the distance between two points be the shortest path on a Riemannian manifold induced by the transformation. The method yields a principled distance measure, provides a tool for visual inspection of deep generative models, and an alternative to linear interpolation in latent space. In addition, it can be applied for robot movement generalization using previously learned skills. The method is evaluated on a synthetic dataset with known ground truth; on a simulated robot arm dataset; on human motion capture data; and on a generative model of handwritten digits.
MLOct 13, 2017
Unsupervised Real-Time Control through Variational EmpowermentMaximilian Karl, Maximilian Soelch, Philip Becker-Ehmck et al.
We introduce a methodology for efficiently computing a lower bound to empowerment, allowing it to be used as an unsupervised cost function for policy learning in real-time control. Empowerment, being the channel capacity between actions and states, maximises the influence of an agent on its near future. It has been shown to be a good model of biological behaviour in the absence of an extrinsic goal. But empowerment is also prohibitively hard to compute, especially in nonlinear continuous spaces. We introduce an efficient, amortised method for learning empowerment-maximising policies. We demonstrate that our algorithm can reliably handle continuous dynamical systems using system dynamics learned from raw data. The resulting policies consistently drive the agents into states where they can use their full potential.
ROJun 23, 2016
Unsupervised preprocessing for Tactile DataMaximilian Karl, Justin Bayer, Patrick van der Smagt
Tactile information is important for gripping, stable grasp, and in-hand manipulation, yet the complexity of tactile data prevents widespread use of such sensors. We make use of an unsupervised learning algorithm that transforms the complex tactile data into a compact, latent representation without the need to record ground truth reference data. These compact representations can either be used directly in a reinforcement learning based controller or can be used to calibrate the tactile sensor to physical quantities with only a few datapoints. We show the quality of our latent representation by predicting important features and with a simple control task.
ROJun 21, 2016
ML-based tactile sensor calibration: A universal approachMaximilian Karl, Artur Lohrer, Dhananjay Shah et al.
We study the responses of two tactile sensors, the fingertip sensor from the iCub and the BioTac under different external stimuli. The question of interest is to which degree both sensors i) allow the estimation of force exerted on the sensor and ii) enable the recognition of differing degrees of curvature. Making use of a force controlled linear motor affecting the tactile sensors we acquire several high-quality data sets allowing the study of both sensors under exactly the same conditions. We also examined the structure of the representation of tactile stimuli in the recorded tactile sensor data using t-SNE embeddings. The experiments show that both the iCub and the BioTac excel in different settings.
MLMay 20, 2016
Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw DataMaximilian Karl, Maximilian Soelch, Justin Bayer et al.
We introduce Deep Variational Bayes Filters (DVBF), a new method for unsupervised learning and identification of latent Markovian state space models. Leveraging recent advances in Stochastic Gradient Variational Bayes, DVBF can overcome intractable inference distributions via variational inference. Thus, it can handle highly nonlinear input data with temporal and spatial dependencies such as image sequences without domain knowledge. Our experiments show that enabling backpropagation through transitions enforces state space assumptions and significantly improves information content of the latent embedding. This also enables realistic long-term prediction.
SCMay 9, 2016
Theano: A Python framework for fast computation of mathematical expressionsThe Theano Development Team, Rami Al-Rfou, Guillaume Alain et al.
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
MLFeb 23, 2016
Variational Inference for On-line Anomaly Detection in High-Dimensional Time SeriesMaximilian Soelch, Justin Bayer, Marvin Ludersdorfer et al.
Approximate variational inference has shown to be a powerful tool for modeling unknown complex probability distributions. Recent advances in the field allow us to learn probabilistic models of sequences that actively exploit spatial and temporal structure. We apply a Stochastic Recurrent Network (STORN) to learn robot time series data. Our evaluation demonstrates that we can robustly detect anomalies both off- and on-line.
MLSep 28, 2015
Efficient EmpowermentMaximilian Karl, Justin Bayer, Patrick van der Smagt
Empowerment quantifies the influence an agent has on its environment. This is formally achieved by the maximum of the expected KL-divergence between the distribution of the successor state conditioned on a specific action and a distribution where the actions are marginalised out. This is a natural candidate for an intrinsic reward signal in the context of reinforcement learning: the agent will place itself in a situation where its action have maximum stability and maximum influence on the future. The limiting factor so far has been the computational complexity of the method: the only way of calculation has so far been a brute force algorithm, reducing the applicability of the method to environments with a small set discrete states. In this work, we propose to use an efficient approximation for marginalising out the actions in the case of continuous environments. This allows fast evaluation of empowerment, paving the way towards challenging environments such as real world robotics. The method is presented on a pendulum swing up problem.
MLJul 19, 2015
Fast Adaptive Weight NoiseJustin Bayer, Maximilian Karl, Daniela Korhammer et al.
Marginalising out uncertain quantities within the internal representations or parameters of neural networks is of central importance for a wide range of learning techniques, such as empirical, variational or full Bayesian methods. We set out to generalise fast dropout (Wang & Manning, 2013) to cover a wider variety of noise processes in neural networks. This leads to an efficient calculation of the marginal likelihood and predictive distribution which evades sampling and the consequential increase in training time due to highly variant gradient estimates. This allows us to approximate variational Bayes for the parameters of feed-forward neural networks. Inspired by the minimum description length principle, we also propose and experimentally verify the direct optimisation of the regularised predictive distribution. The methods yield results competitive with previous neural network based approaches and Gaussian processes on a wide range of regression tasks.
MLNov 27, 2014
Learning Stochastic Recurrent NetworksJustin Bayer, Christian Osendorfer
Leveraging advances in variational inference, we propose to enhance recurrent neural networks with latent variables, resulting in Stochastic Recurrent Networks (STORNs). The model i) can be trained with stochastic gradient methods, ii) allows structured and multi-modal conditionals at each time step, iii) features a reliable estimator of the marginal likelihood and iv) is a generalisation of deterministic recurrent neural networks. We evaluate the method on four polyphonic musical data sets and motion capture data.
MLOct 21, 2014
Regularizing Recurrent Networks - On Injected Noise and Norm-based MethodsSaahil Ognawala, Justin Bayer
Advancements in parallel processing have lead to a surge in multilayer perceptrons' (MLP) applications and deep learning in the past decades. Recurrent Neural Networks (RNNs) give additional representational power to feedforward MLPs by providing a way to treat sequential data. However, RNNs are hard to train using conventional error backpropagation methods because of the difficulty in relating inputs over many time-steps. Regularization approaches from MLP sphere, like dropout and noisy weight training, have been insufficiently applied and tested on simple RNNs. Moreover, solutions have been proposed to improve convergence in RNNs but not enough to improve the long term dependency remembering capabilities thereof. In this study, we aim to empirically evaluate the remembering and generalization ability of RNNs on polyphonic musical datasets. The models are trained with injected noise, random dropout, norm-based regularizers and their respective performances compared to well-initialized plain RNNs and advanced regularization methods like fast-dropout. We conclude with evidence that training with noise does not improve performance as conjectured by a few works in RNN optimization before ours.
MLJun 6, 2014
Variational inference of latent state sequences using Recurrent NetworksJustin Bayer, Christian Osendorfer
Recent advances in the estimation of deep directed graphical models and recurrent networks let us contribute to the removal of a blind spot in the area of probabilistc modelling of time series. The proposed methods i) can infer distributed latent state-space trajectories with nonlinear transitions, ii) scale to large data sets thanks to the use of a stochastic objective and fast, approximate inference, iii) enable the design of rich emission models which iv) will naturally lead to structured outputs. Two different paths of introducing latent state sequences are pursued, leading to the variational recurrent auto encoder (VRAE) and the variational one step predictor (VOSP). The use of independent Wiener processes as priors on the latent state sequence is a viable compromise between efficient computation of the Kullback-Leibler divergence from the variational approximation of the posterior and maintaining a reasonable belief in the dynamics. We verify our methods empirically, obtaining results close or superior to the state of the art. We also show qualitative results for denoising and missing value imputation.
MLNov 4, 2013
On Fast Dropout and its Applicability to Recurrent NetworksJustin Bayer, Christian Osendorfer, Daniela Korhammer et al.
Recurrent Neural Networks (RNNs) are rich models for the processing of sequential data. Recent work on advancing the state of the art has been focused on the optimization or modelling of RNNs, mostly motivated by adressing the problems of the vanishing and exploding gradients. The control of overfitting has seen considerably less attention. This paper contributes to that by analyzing fast dropout, a recent regularization method for generalized linear models and neural networks from a back-propagation inspired perspective. We show that fast dropout implements a quadratic form of an adaptive, per-parameter regularizer, which rewards large weights in the light of underfitting, penalizes them for overconfident predictions and vanishes at minima of an unregularized training loss. The derivatives of that regularizer are exclusively based on the training error signal. One consequence of this is the absense of a global weight attractor, which is particularly appealing for RNNs, since the dynamics are not biased towards a certain regime. We positively test the hypothesis that this improves the performance of RNNs on four musical data sets.
CVApr 30, 2013
Convolutional Neural Networks learn compact local image descriptorsChristian Osendorfer, Justin Bayer, Patrick van der Smagt
A standard deep convolutional neural network paired with a suitable loss function learns compact local image descriptors that perform comparably to state-of-the art approaches.
CVJan 14, 2013
Unsupervised Feature Learning for low-level Local Image DescriptorsChristian Osendorfer, Justin Bayer, Sebastian Urban et al.
Unsupervised feature learning has shown impressive results for a wide range of input modalities, in particular for object classification tasks in computer vision. Using a large amount of unlabeled data, unsupervised feature learning methods are utilized to construct high-level representations that are discriminative enough for subsequently trained supervised classification algorithms. However, it has never been \emph{quantitatively} investigated yet how well unsupervised learning methods can find \emph{low-level representations} for image patches without any additional supervision. In this paper we examine the performance of pure unsupervised methods on a low-level correspondence task, a problem that is central to many Computer Vision applications. We find that a special type of Restricted Boltzmann Machines (RBMs) performs comparably to hand-crafted descriptors. Additionally, a simple binarization scheme produces compact representations that perform better than several state-of-the-art descriptors.