Aivar Sootla

LG
21papers
215citations
Novelty51%
AI Score26

21 Papers

LGMay 31, 2022
Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints

David Mguni, Aivar Sootla, Juliusz Ziomek et al. · oxford

Many real-world settings involve costs for performing actions; transaction costs in financial systems and fuel costs being common examples. In these settings, performing actions at each time step quickly accumulates costs leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces wear and tear and ultimately, damage. Determining \textit{when to act} is crucial for achieving successful outcomes and yet, the challenge of efficiently \textit{learning} to behave optimally when actions incur minimally bounded costs remains unresolved. In this paper, we introduce a reinforcement learning (RL) framework named \textbf{L}earnable \textbf{I}mpulse \textbf{C}ontrol \textbf{R}einforcement \textbf{A}lgorithm (LICRA), for learning to optimally select both when to act and which actions to take when actions incur costs. At the core of LICRA is a nested structure that combines RL and a form of policy known as \textit{impulse control} which learns to maximise objectives when actions incur costs. We prove that LICRA, which seamlessly adopts any RL method, converges to policies that optimally select when to perform actions and their optimal magnitudes. We then augment LICRA to handle problems in which the agent can perform at most $k<\infty$ actions and more generally, faces a budget constraint. We show LICRA learns the optimal value function and ensures budget constraints are satisfied almost surely. We demonstrate empirically LICRA's superior performance against benchmark RL methods in OpenAI gym's \textit{Lunar Lander} and in \textit{Highway} environments and a variant of the Merton portfolio problem within finance.

OCJun 28, 2018
An Optimal Control Formulation of Pulse-Based Control Using Koopman Operator

Aivar Sootla, Alexandre Mauroy, Damien Ernst

In many applications, and in systems/synthetic biology, in particular, it is desirable to compute control policies that force the trajectory of a bistable system from one equilibrium (the initial point) to another equilibrium (the target point), or in other words to solve the switching problem. It was recently shown that, for monotone bistable systems, this problem admits easy-to-implement open-loop solutions in terms of temporal pulses (i.e., step functions of fixed length and fixed magnitude). In this paper, we develop this idea further and formulate a problem of convergence to an equilibrium from an arbitrary initial point. We show that this problem can be solved using a static optimization problem in the case of monotone systems. Changing the initial point to an arbitrary state allows to build closed-loop, event-based or open-loop policies for the switching/convergence problems. In our derivations we exploit the Koopman operator, which offers a linear infinite-dimensional representation of an autonomous nonlinear system. One of the main advantages of using the Koopman operator is the powerful computational tools developed for this framework. Besides the presence of numerical solutions, the switching/convergence problem can also serve as a building block for solving more complicated control problems and can potentially be applied to non-monotone systems. We illustrate this argument on the problem of synchronizing cardiac cells by defibrillation. Potentially, our approach can be extended to problems with different parametrizations of control signals since the only fundamental limitation is the finite time application of the control signal.

SYSep 20, 2017
Block-Diagonal Solutions to Lyapunov Inequalities and Generalisations of Diagonal Dominance

Aivar Sootla, Yang Zheng, Antonis Papachristodoulou

Diagonally dominant matrices have many applications in systems and control theory. Linear dynamical systems with scaled diagonally dominant drift matrices, which include stable positive systems, allow for scalable stability analysis. For example, it is known that Lyapunov inequalities for this class of systems admit diagonal solutions. In this paper, we present an extension of scaled diagonally dominance to block partitioned matrices. We show that our definition describes matrices admitting block-diagonal solutions to Lyapunov inequalities and that these solutions can be computed using linear algebraic tools. We also show how in some cases the Lyapunov inequalities can be decoupled into a set of lower dimensional linear matrix inequalities, thus leading to improved scalability. We conclude by illustrating some advantages and limitations of our results with numerical examples.

SYMar 12, 2013
On Periodic Reference Tracking Using Batch-Mode Reinforcement Learning with Application to Gene Regulatory Network Control

Aivar Sootla, Natalja Strelkowa, Damien Ernst et al.

In this paper, we consider the periodic reference tracking problem in the framework of batch-mode reinforcement learning, which studies methods for solving optimal control problems from the sole knowledge of a set of trajectories. In particular, we extend an existing batch-mode reinforcement learning algorithm, known as Fitted Q Iteration, to the periodic reference tracking problem. The presented periodic reference tracking algorithm explicitly exploits a priori knowledge of the future values of the reference trajectory and its periodicity. We discuss the properties of our approach and illustrate it on the problem of reference tracking for a synthetic biology gene regulatory network known as the generalised repressilator. This system can produce decaying but long-lived oscillations, which makes it an interesting system for the tracking problem. In our companion paper we also take a look at the regulation problem of the toggle switch system, where the main goal is to drive the system's states to a specific bounded region in the state space.

LGSep 10, 2022
Structured Q-learning For Antibody Design

Alexander I. Cowen-Rivers, Philip John Gorinski, Aivar Sootla et al.

Optimizing combinatorial structures is core to many real-world problems, such as those encountered in life sciences. For example, one of the crucial steps involved in antibody design is to find an arrangement of amino acids in a protein sequence that improves its binding with a pathogen. Combinatorial optimization of antibodies is difficult due to extremely large search spaces and non-linear objectives. Even for modest antibody design problems, where proteins have a sequence length of eleven, we are faced with searching over 2.05 x 10^14 structures. Applying traditional Reinforcement Learning algorithms such as Q-learning to combinatorial optimization results in poor performance. We propose Structured Q-learning (SQL), an extension of Q-learning that incorporates structural priors for combinatorial optimization. Using a molecular docking simulator, we demonstrate that SQL finds high binding energy sequences and performs favourably against baselines on eight challenging antibody design tasks, including designing antibodies for SARS-COV.

OCMar 12, 2019
Block Factor-Width-Two Matrices in Semidefinite Programming

Aivar Sootla, Yang Zheng, Antonis Papachristodoulou

In this paper, we introduce a set of block factor-width-two matrices, which is a generalisation of factor-width-two matrices and is a subset of positive semidefinite matrices. The set of block factor-width-two matrices is a proper cone and we compute a closed-form expression for its dual cone. We use these cones to build hierarchies of inner and outer approximations of the cone of positive semidefinite matrices. The main feature of these cones is that they enable a decomposition of a large semidefinite constraint into a number of smaller semidefinite constraints. As the main application of these classes of matrices, we envision large-scale semidefinite feasibility optimisation programs including sum-of-squares (SOS) programs. We present numerical examples from SOS optimisation showcasing the properties of this decomposition.

LGMay 30, 2022
SEREN: Knowing When to Explore and When to Exploit

Changmin Yu, David Mguni, Dong Li et al.

Efficient reinforcement learning (RL) involves a trade-off between "exploitative" actions that maximise expected reward and "explorative'" ones that sample unvisited states. To encourage exploration, recent approaches proposed adding stochasticity to actions, separating exploration and exploitation phases, or equating reduction in uncertainty with reward. However, these techniques do not necessarily offer entirely systematic approaches making this trade-off. Here we introduce SElective Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game between an RL agent -- \exploiter, which purely exploits known rewards, and another RL agent -- \switcher, which chooses at which states to activate a pure exploration policy that is trained to minimise system uncertainty and override Exploiter. Using a form of policies known as impulse control, \switcher is able to determine the best set of states to switch to the exploration policy while Exploiter is free to execute its actions everywhere else. We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation. Through extensive empirical studies in both discrete (MiniGrid) and continuous (MuJoCo) control benchmarks, we show that SEREN can be readily combined with existing RL algorithms to yield significant improvement in performance relative to state-of-the-art algorithms.

SYMay 8, 2017
Geometric Properties of Isostables and Basins of Attraction of Monotone Systems

Aivar Sootla, Alexandre Mauroy

In this paper, we study geometric properties of basins of attraction of monotone systems. Our results are based on a combination of monotone systems theory and spectral operator theory. We exploit the framework of the Koopman operator, which provides a linear infinite-dimensional description of nonlinear dynamical systems and spectral operator-theoretic notions such as eigenvalues and eigenfunctions. The sublevel sets of the dominant eigenfunction form a family of nested forward-invariant sets and the basin of attraction is the largest of these sets. The boundaries of these sets, called isostables, allow studying temporal properties of the system. Our first observation is that the dominant eigenfunction is increasing in every variable in the case of monotone systems. This is a strong geometric property which simplifies the computation of isostables. We also show how variations in basins of attraction can be bounded under parametric uncertainty in the vector field of monotone systems. Finally, we study the properties of the parameter set for which a monotone system is multistable. Our results are illustrated on several systems of two to four dimensions.

OCMar 22, 2016
Properties of Isostables and Basins of Attraction of Monotone Systems

Aivar Sootla, Alexandre Mauroy

In this paper, we investigate geometric properties of monotone systems by studying their isostables and basins of attraction. Isostables are boundaries of specific forward-invariant sets defined by the so-called Koopman operator, which provides a linear infinite-dimensional description of a nonlinear system. First, we study the spectral properties of the Koopman operator and the associated semigroup in the context of monotone systems. Our results generalize the celebrated Perron-Frobenius theorem to the nonlinear case and allow us to derive geometric properties of isostables and basins of attraction. Additionally, we show that under certain conditions we can characterize the bounds on the basins of attraction under parametric uncertainty in the vector field. We discuss computational approaches to estimate isostables and basins of attraction and illustrate the results on two and four state monotone systems.

OCMay 19, 2024
Properties of Eventually Positive Linear Input-Output Systems

Aivar Sootla

In this paper, we consider the systems with trajectories originating in the nonnegative orthant becoming nonnegative after some finite time transient. First we consider dynamical systems (i.e., fully observable systems with no inputs), which we call eventually positive. We compute forward-invariant cones and Lyapunov functions for these systems. We then extend the notion of eventually positive systems to the input-output system case. Our extension is performed in such a manner, that some valuable properties of classical internally positive input-output systems are preserved. For example, their induced norms can be computed using linear programming and the energy functions have nonnegative derivatives.

SYMay 20, 2016
Shaping Pulses to Control Bistable Monotone Systems Using Koopman Operator

Aivar Sootla, Alexandre Mauroy, Jorge Goncalves

In this paper, we further develop a recently proposed control method to switch a bistable system between its steady states using temporal pulses. The motivation for using pulses comes from biomedical and biological applications (e.g. synthetic biology), where it is generally difficult to build feedback control systems due to technical limitations in sensing and actuation. The original framework was derived for monotone systems and all the extensions relied on monotone systems theory. In contrast, we introduce the concept of switching function which is related to eigenfunctions of the so-called Koopman operator subject to a fixed control pulse. Using the level sets of the switching function we can (i) compute the set of all pulses that drive the system toward the steady state in a synchronous way and (ii) estimate the time needed by the flow to reach an epsilon neighborhood of the target steady state. Additionally, we show that for monotone systems the switching function is also monotone in some sense, a property that can yield efficient algorithms to compute it. This observation recovers and further extends the results of the original framework, which we illustrate on numerical examples inspired by biological applications.

LGJun 6, 2022
Effects of Safety State Augmentation on Safe Exploration

Aivar Sootla, Alexander I. Cowen-Rivers, Jun Wang et al.

Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations -- a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisfied. The value of this state also serves as a distance toward constraint violation, while its initial value indicates the available safety budget. This idea allows us to derive policies for scheduling the safety budget during training. We call our approach Simmer (Safe policy IMproveMEnt for RL) to reflect the careful nature of these schedules. We apply this idea to two safe RL problems: RL with constraints imposed on an average cost, and RL with constraints imposed on a cost with probability one. Our experiments suggest that "simmering, a safe algorithm can improve safety during training for both settings. We further show that Simmer can stabilize training and improve the performance of safe RL with average constraints.

LGFeb 14, 2022
Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

Aivar Sootla, Alexander I. Cowen-Rivers, Taher Jafferjee et al.

Satisfying safety constraints almost surely (or with probability one) can be critical for the deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting them into the state-space and reshaping the objective. We show that Saute MDP satisfies the Bellman equation and moves us closer to solving Safe RL with constraints satisfied almost surely. We argue that Saute MDP allows viewing the Safe RL problem from a different perspective enabling new features. For instance, our approach has a plug-and-play nature, i.e., any RL algorithm can be "Sauteed". Additionally, state augmentation allows for policy generalization across safety constraints. We finally show that Saute RL algorithms can outperform their state-of-the-art counterparts when constraint satisfaction is of high importance.

LGFeb 14, 2022
Reinforcement Learning in Presence of Discrete Markovian Context Evolution

Hang Ren, Aivar Sootla, Taher Jafferjee et al.

We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it using a Bayesian approach and variational inference. We adapt a sticky Hierarchical Dirichlet Process (HDP) prior for model learning, which is arguably best-suited for Markov process modeling. We then derive a context distillation procedure, which identifies and removes spurious contexts in an unsupervised fashion. We argue that the combination of these two components allows to infer the number of contexts from data thus dealing with the context cardinality assumption. We then find the representation of the optimal policy enabling efficient policy learning using off-the-shelf RL algorithms. Finally, we demonstrate empirically (using gym environments cart-pole swing-up, drone, intersection) that our approach succeeds where state-of-the-art methods of other frameworks fail and elaborate on the reasons for such failures.

LGOct 27, 2021
DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

David Mguni, Usman Islam, Yaqi Sun et al.

Reinforcement learning (RL) involves performing exploratory actions in an unknown system. This can place a learning agent in dangerous and potentially catastrophic system states. Current approaches for tackling safe learning in RL simultaneously trade-off safe exploration and task fulfillment. In this paper, we introduce a new generation of RL solvers that learn to minimise safety violations while maximising the task reward to the extent that can be tolerated by the safe policy. Our approach introduces a novel two-player framework for safe RL called Distributive Exploration Safety Training Algorithm (DESTA). The core of DESTA is a game between two adaptive agents: Safety Agent that is delegated the task of minimising safety violations and Task Agent whose goal is to maximise the environment reward. Specifically, Safety Agent can selectively take control of the system at any given point to prevent safety violations while Task Agent is free to execute its policy at any other states. This framework enables Safety Agent to learn to take actions at certain states that minimise future safety violations, both during training and testing time, while Task Agent performs actions that maximise the task performance everywhere else. Theoretically, we prove that DESTA converges to stable points enabling safety violations of pretrained policies to be minimised. Empirically, we show DESTA's ability to augment the safety of existing policies and secondly, construct safe RL policies when the Task Agent and Safety Agent are trained concurrently. We demonstrate DESTA's superior performance against leading RL methods in Lunar Lander and Frozen Lake from OpenAI gym.

MLJul 6, 2021
Viscos Flows: Variational Schur Conditional Sampling With Normalizing Flows

Vincent Moens, Aivar Sootla, Haitham Bou Ammar et al.

We present a method for conditional sampling for pre-trained normalizing flows when only part of an observation is available. We derive a lower bound to the conditioning variable log-probability using Schur complement properties in the spirit of Gaussian conditional sampling. Our derivation relies on partitioning flow's domain in such a way that the flow restrictions to subdomains remain bijective, which is crucial for the Schur complement application. Simulation from the variational conditional flow then amends to solving an equality constraint. Our contribution is three-fold: a) we provide detailed insights on the choice of variational distributions; b) we discuss how to partition the input space of the flow to preserve bijectivity property; c) we propose a set of methods to optimise the variational distribution. Our numerical results indicate that our sampling method can be successfully applied to invertible residual networks for inference and classification.

CVOct 10, 2020
Diagnosing and Preventing Instabilities in Recurrent Video Processing

Thomas Tanay, Aivar Sootla, Matteo Maggioni et al.

Recurrent models are a popular choice for video enhancement tasks such as video denoising or super-resolution. In this work, we focus on their stability as dynamical systems and show that they tend to fail catastrophically at inference time on long video sequences. To address this issue, we (1) introduce a diagnostic tool which produces input sequences optimized to trigger instabilities and that can be interpreted as visualizations of temporal receptive fields, and (2) propose two approaches to enforce the stability of a model during training: constraining the spectral norm or constraining the stable rank of its convolutional layers. We then introduce Stable Rank Normalization for Convolutional layers (SRN-C), a new algorithm that enforces these constraints. Our experimental results suggest that SRN-C successfully enforces stability in recurrent video processing models without a significant performance loss.

LGJun 12, 2020
SAMBA: Safe Model-Based & Active Reinforcement Learning

Alexander I. Cowen-Rivers, Daniel Palenicek, Vincent Moens et al.

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.

SYAug 1, 2017
Pulse-Based Control Using Koopman Operator Under Parametric Uncertainty

Aivar Sootla, Damien Ernst

In applications, such as biomedicine and systems/synthetic biology, technical limitations in actuation complicate implementation of time-varying control signals. In order to alleviate some of these limitations, it may be desirable to derive simple control policies, such as step functions with fixed magnitude and length (or temporal pulses). In this technical note, we further develop a recently proposed pulse-based solution to the convergence problem, i.e., minimizing the convergence time to the target exponentially stable equilibrium, for monotone systems. In particular, we extend this solution to monotone systems with parametric uncertainty. Our solutions also provide worst-case estimates on convergence times. Furthermore, we indicate how our tools can be used for a class of non-monotone systems, and more importantly how these tools can be extended to other control problems. We illustrate our approach on switching under parametric uncertainty and regulation around a saddle point problems in a genetic toggle switch system.

OCMar 28, 2014
Distributed Reconstruction of Nonlinear Networks: An ADMM Approach

Wei Pan, Aivar Sootla, Guy-Bart Stan

In this paper, we present a distributed algorithm for the reconstruction of large-scale nonlinear networks. In particular, we focus on the identification from time-series data of the nonlinear functional forms and associated parameters of large-scale nonlinear networks. Recently, a nonlinear network reconstruction problem was formulated as a nonconvex optimisation problem based on the combination of a marginal likelihood maximisation procedure with sparsity inducing priors. Using a convex-concave procedure (CCCP), an iterative reweighted lasso algorithm was derived to solve the initial nonconvex optimisation problem. By exploiting the structure of the objective function of this reweighted lasso algorithm, a distributed algorithm can be designed. To this end, we apply the alternating direction method of multipliers (ADMM) to decompose the original problem into several subproblems. To illustrate the effectiveness of the proposed methods, we use our approach to identify a network of interconnected Kuramoto oscillators with different network sizes (500~100,000 nodes).

SYMar 12, 2013
Toggling a Genetic Switch Using Reinforcement Learning

Aivar Sootla, Natalja Strelkowa, Damien Ernst et al.

In this paper, we consider the problem of optimal exogenous control of gene regulatory networks. Our approach consists in adapting an established reinforcement learning algorithm called the fitted Q iteration. This algorithm infers the control law directly from the measurements of the system's response to external control inputs without the use of a mathematical model of the system. The measurement data set can either be collected from wet-lab experiments or artificially created by computer simulations of dynamical models of the system. The algorithm is applicable to a wide range of biological systems due to its ability to deal with nonlinear and stochastic system dynamics. To illustrate the application of the algorithm to a gene regulatory network, the regulation of the toggle switch system is considered. The control objective of this problem is to drive the concentrations of two specific proteins to a target region in the state space.