LGDec 6, 2022
PRISM: Probabilistic Real-Time Inference in Spatial World ModelsAtanas Mirchev, Baris Kayalibay, Ahmed Agha et al.
We introduce PRISM, a method for real-time filtering in a probabilistic generative model of agent motion and visual perception. Previous approaches either lack uncertainty estimates for the map and agent state, do not run in real-time, do not have a dense scene representation or do not model agent dynamics. Our solution reconciles all of these aspects. We start from a predefined state-space model which combines differentiable rendering and 6-DoF dynamics. Probabilistic inference in this model amounts to simultaneous localisation and mapping (SLAM) and is intractable. We use a series of approximations to Bayesian inference to arrive at probabilistic map and state estimates. We take advantage of well-established methods and closed-form updates, preserving accuracy and enabling real-time capability. The proposed solution runs at 10Hz real-time and is similarly accurate to state-of-the-art SLAM in small to medium-sized indoor environments, with high-speed UAV and handheld camera agents (Blackbird, EuRoC and TUM-RGBD).
LGApr 20, 2023
Filter-Aware Model-Predictive ControlBaris Kayalibay, Atanas Mirchev, Ahmed Agha et al.
Partially-observable problems pose a trade-off between reducing costs and gathering information. They can be solved optimally by planning in belief space, but that is often prohibitively expensive. Model-predictive control (MPC) takes the alternative approach of using a state estimator to form a belief over the state, and then plan in state space. This ignores potential future observations during planning and, as a result, cannot actively increase or preserve the certainty of its own state estimate. We find a middle-ground between planning in belief space and completely ignoring its dynamics by only reasoning about its future accuracy. Our approach, filter-aware MPC, penalises the loss of information by what we call "trackability", the expected error of the state estimator. We show that model-based simulation allows condensing trackability into a neural network, which allows fast planning. In experiments involving visual navigation, realistic every-day environments and a two-link robot arm, we show that filter-aware MPC vastly improves regular MPC.
LGJan 25, 2022
Tracking and Planning with Spatial World ModelsBaris Kayalibay, Atanas Mirchev, Patrick van der Smagt et al.
We introduce a method for real-time navigation and tracking with differentiably rendered world models. Learning models for control has led to impressive results in robotics and computer games, but this success has yet to be extended to vision-based navigation. To address this, we transfer advances in the emergent field of differentiable rendering to model-based control. We do this by planning in a learned 3D spatial world model, combined with a pose estimation algorithm previously used in the context of TSDF fusion, but now tailored to our setting and improved to incorporate agent dynamics. We evaluate over six simulated environments based on complex human-designed floor plans and provide quantitative results. We achieve up to 92% navigation success rate at a frequency of 15 Hz using only image and depth observations under stochastic, continuous dynamics.
LGJan 18, 2021
Mind the Gap when Conditioning Amortised Inference in Sequential Latent-Variable ModelsJustin Bayer, Maximilian Soelch, Atanas Mirchev et al.
Amortised inference enables scalable learning of sequential latent-variable models (LVMs) with the evidence lower bound (ELBO). In this setting, variational posteriors are often only partially conditioned. While the true posteriors depend, e.g., on the entire sequence of observations, approximate posteriors are only informed by past observations. This mimics the Bayesian filter -- a mixture of smoothing posteriors. Yet, we show that the ELBO objective forces partially-conditioned amortised posteriors to approximate products of smoothing posteriors instead. Consequently, the learned generative model is compromised. We demonstrate these theoretical findings in three scenarios: traffic flow, handwritten digits, and aerial vehicle dynamics. Using fully-conditioned approximate posteriors, performance improves in terms of generative modelling and multi-step prediction.
MLJun 17, 2020
Variational State-Space Models for Localisation and Dense 3D Mapping in 6 DoFAtanas Mirchev, Baris Kayalibay, Patrick van der Smagt et al.
We solve the problem of 6-DoF localisation and 3D dense reconstruction in spatial environments as approximate Bayesian inference in a deep state-space model. Our approach leverages both learning and domain knowledge from multiple-view geometry and rigid-body dynamics. This results in an expressive predictive model of the world, often missing in current state-of-the-art visual SLAM solutions. The combination of variational inference, neural networks and a differentiable raycaster ensures that our model is amenable to end-to-end gradient-based optimisation. We evaluate our approach on realistic unmanned aerial vehicle flight data, nearing the performance of state-of-the-art visual-inertial odometry systems. We demonstrate the applicability of the model to generative prediction and planning.
MLMay 18, 2018
Approximate Bayesian inference in spatial environmentsAtanas Mirchev, Baris Kayalibay, Maximilian Soelch et al.
Model-based approaches bear great promise for decision making of agents interacting with the physical world. In the context of spatial environments, different types of problems such as localisation, mapping, navigation or autonomous exploration are typically adressed with specialised methods, often relying on detailed knowledge of the system at hand. We express these tasks as probabilistic inference and planning under the umbrella of deep sequential generative models. Using the frameworks of variational inference and neural networks, our method inherits favourable properties such as flexibility, scalability and the ability to learn from data. The method performs comparably to specialised state-of-the-art methodology in two distinct simulated environments.
CVJan 11, 2017
CNN-based Segmentation of Medical Imaging DataBaris Kayalibay, Grady Jensen, Patrick van der Smagt
Convolutional neural networks have been applied to a wide variety of computer vision tasks. Recent advances in semantic segmentation have enabled their application to medical image segmentation. While most CNNs use two-dimensional kernels, recent CNN-based publications on medical image segmentation featured three-dimensional kernels, allowing full access to the three-dimensional structure of medical images. Though closely related to semantic segmentation, medical image segmentation includes specific challenges that need to be addressed, such as the scarcity of labelled data, the high class imbalance found in the ground truth and the high memory demand of three-dimensional images. In this work, a CNN-based method with three-dimensional filters is demonstrated and applied to hand and brain MRI. Two modifications to an existing CNN architecture are discussed, along with methods on addressing the aforementioned challenges. While most of the existing literature on medical image segmentation focuses on soft tissue and the major organs, this work is validated on data both from the central nervous system as well as the bones of the hand.