Botond Cseke

ML
h-index44
14papers
2,087citations
Novelty50%
AI Score43

14 Papers

LGFeb 26
Latent Matters: Learning Deep State-Space Models

Alexej Klushyn, Richard Kurle, Maximilian Soelch et al.

Deep state-space models (DSSMs) enable temporal predictions by learning the underlying dynamics of observed sequence data. They are often trained by maximising the evidence lower bound. However, as we show, this does not ensure the model actually learns the underlying dynamics. We therefore propose a constrained optimisation framework as a general approach for training DSSMs. Building upon this, we introduce the extended Kalman VAE (EKVAE), which combines amortised variational inference with classic Bayesian filtering/smoothing to model dynamics more accurately than RNN-based DSSMs. Our results show that the constrained optimisation framework significantly improves system identification and prediction accuracy on the example of established state-of-the-art DSSMs. The EKVAE outperforms previous models w.r.t. prediction accuracy, achieves remarkable results in identifying dynamical systems, and can furthermore successfully learn state-space representations where static and dynamic features are disentangled.

LGJun 13, 2022
Local Distance Preserving Auto-encoders using Continuous k-Nearest Neighbours Graphs

Nutan Chen, Patrick van der Smagt, Botond Cseke

Auto-encoder models that preserve similarities in the data are a popular tool in representation learning. In this paper we introduce several auto-encoder models that preserve local distances when mapping from the data space to the latent space. We use a local distance preserving loss that is based on the continuous k-nearest neighbours graph which is known to capture topological features at all scales simultaneously. To improve training performance, we formulate learning as a constraint optimisation problem with local distance preservation as the main objective and reconstruction accuracy as a constraint. We generalise this approach to hierarchical variational auto-encoders thus learning generative models with geometrically consistent latent and data spaces. Our method provides state-of-the-art performance across several standard datasets and evaluation metrics.

LGMay 20, 2025
FlowQ: Energy-Guided Flow Policies for Offline Reinforcement Learning

Marvin Alles, Nutan Chen, Patrick van der Smagt et al.

The use of guidance to steer sampling toward desired outcomes has been widely explored within diffusion models, especially in applications such as image and trajectory generation. However, incorporating guidance during training remains relatively underexplored. In this work, we introduce energy-guided flow matching, a novel approach that enhances the training of flow models and eliminates the need for guidance at inference time. We learn a conditional velocity field corresponding to the flow policy by approximating an energy-guided probability path as a Gaussian path. Learning guided trajectories is appealing for tasks where the target distribution is defined by a combination of data and an energy function, as in reinforcement learning. Diffusion-based policies have recently attracted attention for their expressive power and ability to capture multi-modal action distributions. Typically, these policies are optimized using weighted objectives or by back-propagating gradients through actions sampled by the policy. As an alternative, we propose FlowQ, an offline reinforcement learning algorithm based on energy-guided flow matching. Our method achieves competitive performance while the policy training time is constant in the number of flow sampling steps.

ROMar 22, 2024
Guided Decoding for Robot On-line Motion Generation and Adaption

Nutan Chen, Botond Cseke, Elie Aljalbout et al.

We present a novel motion generation approach for robot arms, with high degrees of freedom, in complex settings that can adapt online to obstacles or new via points. Learning from Demonstration facilitates rapid adaptation to new tasks and optimizes the utilization of accumulated expertise by allowing robots to learn and generalize from demonstrated trajectories. We train a transformer architecture, based on conditional variational autoencoder, on a large dataset of simulated trajectories used as demonstrations. Our architecture learns essential motion generation skills from these demonstrations and is able to adapt them to meet auxiliary tasks. Additionally, our approach implements auto-regressive motion generation to enable real-time adaptations, as, for example, introducing or changing via-points, and velocity and acceleration constraints. Using beam search, we present a method for further adaption of our motion generator to avoid obstacles. We show that our model successfully generates motion from different initial and target points and that is capable of generating trajectories that navigate complex tasks across different robotic platforms.

ROJan 29, 2021
Constrained Probabilistic Movement Primitives for Robot Trajectory Adaptation

Felix Frank, Alexandros Paraschos, Patrick van der Smagt et al.

Placing robots outside controlled conditions requires versatile movement representations that allow robots to learn new tasks and adapt them to environmental changes. The introduction of obstacles or the placement of additional robots in the workspace, the modification of the joint range due to faults or range-of-motion constraints are typical cases where the adaptation capabilities play a key role for safely performing the robot's task. Probabilistic movement primitives (ProMPs) have been proposed for representing adaptable movement skills, which are modelled as Gaussian distributions over trajectories. These are analytically tractable and can be learned from a small number of demonstrations. However, both the original ProMP formulation and the subsequent approaches only provide solutions to specific movement adaptation problems, e.g., obstacle avoidance, and a generic, unifying, probabilistic approach to adaptation is missing. In this paper we develop a generic probabilistic framework for adapting ProMPs. We unify previous adaptation techniques, for example, various types of obstacle avoidance, via-points, mutual avoidance, in one single framework and combine them to solve complex robotic problems. Additionally, we derive novel adaptation techniques such as temporally unbound via-points and mutual avoidance. We formulate adaptation as a constrained optimisation problem where we minimise the Kullback-Leibler divergence between the adapted distribution and the distribution of the original primitive while we constrain the probability mass associated with undesired trajectories to be low. We demonstrate our approach on several adaptation problems on simulated planar robot arms and 7-DOF Franka-Emika robots in a dual robot arm setting.

MLAug 23, 2019
Increasing the Generalisation Capacity of Conditional VAEs

Alexej Klushyn, Nutan Chen, Botond Cseke et al.

We address the problem of one-to-many mappings in supervised learning, where a single instance has many different solutions of possibly equal cost. The framework of conditional variational autoencoders describes a class of methods to tackle such structured-prediction tasks by means of latent variables. We propose to incentivise informative latent representations for increasing the generalisation capacity of conditional variational autoencoders. To this end, we modify the latent variable model by defining the likelihood as a function of the latent variable only and introduce an expressive multimodal prior to enable the model for capturing semantically meaningful features of the data. To validate our approach, we train our model on the Cornell Robot Grasping dataset, and modified versions of MNIST and Fashion-MNIST obtaining results that show a significantly higher generalisation capability.

MLMay 13, 2019
Learning Hierarchical Priors in VAEs

Alexej Klushyn, Nutan Chen, Richard Kurle et al.

We propose to learn a hierarchical prior in the context of variational autoencoders to avoid the over-regularisation resulting from a standard normal prior distribution. To incentivise an informative latent representation of the data, we formulate the learning problem as a constrained optimisation problem by extending the Taming VAEs framework to two-level hierarchical models. We introduce a graph-based interpolation method, which shows that the topology of the learned latent representation corresponds to the topology of the data manifold---and present several examples, where desired properties of latent representation such as smoothness and simple explanatory factors are learned by the prior.

COMP-PHJun 1, 2017
Efficient Low-Order Approximation of First-Passage Time Distributions

David Schnoerr, Botond Cseke, Ramon Grima et al.

We consider the problem of computing first-passage time distributions for reaction processes modelled by master equations. We show that this generally intractable class of problems is equivalent to a sequential Bayesian inference problem for an auxiliary observation process. The solution can be approximated efficiently by solving a closed set of coupled ordinary differential equations (for the low-order moments of the process) whose size scales with the number of species. We apply it to an epidemic model and a trimerisation process, and show good agreement with stochastic simulations.

MLJun 2, 2016
f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

Sebastian Nowozin, Botond Cseke, Ryota Tomioka

Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The generative-adversarial training method allows to train such models through the use of an auxiliary discriminative neural network. We show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation approach. We show that any f-divergence can be used for training generative neural samplers. We discuss the benefits of various choices of divergence functions on training complexity and the quality of the obtained generative models.

MLDec 18, 2015
Expectation propagation for continuous time stochastic processes

Botond Cseke, David Schnoerr, Manfred Opper et al.

We consider the inverse problem of reconstructing the posterior measure over the trajec- tories of a diffusion process from discrete time observations and continuous time constraints. We cast the problem in a Bayesian framework and derive approximations to the posterior distributions of single time marginals using variational approximate inference. We then show how the approximation can be extended to a wide class of discrete-state Markov jump pro- cesses by making use of the chemical Langevin equation. Our empirical results show that the proposed method is computationally efficient and provides good approximations for these classes of inverse problems.

LGJan 16, 2014
Properties of Bethe Free Energies and Message Passing in Gaussian Models

Botond Cseke, Tom Heskes

We address the problem of computing approximate marginals in Gaussian probabilistic models by using mean field and fractional Bethe approximations. We define the Gaussian fractional Bethe free energy in terms of the moment parameters of the approximate marginals, derive a lower and an upper bound on the fractional Bethe free energy and establish a necessary condition for the lower bound to be bounded from below. It turns out that the condition is identical to the pairwise normalizability condition, which is known to be a sufficient condition for the convergence of the message passing algorithm. We show that stable fixed points of the Gaussian message passing algorithm are local minima of the Gaussian Bethe free energy. By a counterexample, we disprove the conjecture stating that the unboundedness of the free energy implies the divergence of the message passing algorithm.

MLMay 17, 2013
Factored expectation propagation for input-output FHMM models in systems biology

Botond Cseke, Guido Sanguinetti

We consider the problem of joint modelling of metabolic signals and gene expression in systems biology applications. We propose an approach based on input-output factorial hidden Markov models and propose a structured variational inference approach to infer the structure and states of the model. We start from the classical free form structured variational mean field approach and use a expectation propagation to approximate the expectations needed in the variational loop. We show that this corresponds to a factored expectation constrained approximate inference. We validate our model through extensive simulations and demonstrate its applicability on a real world bacterial data set.

MLMay 17, 2013
Sparse Approximate Inference for Spatio-Temporal Point Process Models

Botond Cseke, Andrew Zammit Mangion, Tom Heskes et al.

Spatio-temporal point process models play a central role in the analysis of spatially distributed systems in several disciplines. Yet, scalable inference remains computa- tionally challenging both due to the high resolution modelling generally required and the analytically intractable likelihood function. Here, we exploit the sparsity structure typical of (spatially) discretised log-Gaussian Cox process models by using approximate message-passing algorithms. The proposed algorithms scale well with the state dimension and the length of the temporal horizon with moderate loss in distributional accuracy. They hence provide a flexible and faster alternative to both non-linear filtering-smoothing type algorithms and to approaches that implement the Laplace method or expectation propagation on (block) sparse latent Gaussian models. We infer the parameters of the latent Gaussian model using a structured variational Bayes approach. We demonstrate the proposed framework on simulation studies with both Gaussian and point-process observations and use it to reconstruct the conflict intensity and dynamics in Afghanistan from the WikiLeaks Afghan War Diary.

LGJun 13, 2012
Bounds on the Bethe Free Energy for Gaussian Networks

Botond Cseke, Tom Heskes

We address the problem of computing approximate marginals in Gaussian probabilistic models by using mean field and fractional Bethe approximations. As an extension of Welling and Teh (2001), we define the Gaussian fractional Bethe free energy in terms of the moment parameters of the approximate marginals and derive an upper and lower bound for it. We give necessary conditions for the Gaussian fractional Bethe free energies to be bounded from below. It turns out that the bounding condition is the same as the pairwise normalizability condition derived by Malioutov et al. (2006) as a sufficient condition for the convergence of the message passing algorithm. By giving a counterexample, we disprove the conjecture in Welling and Teh (2001): even when the Bethe free energy is not bounded from below, it can possess a local minimum to which the minimization algorithms can converge.