Thomas Schön

ML
8papers
113citations
Novelty54%
AI Score25

8 Papers

SYMar 10, 2015
On the exponential convergence of the Kaczmarz algorithm

Liang Dai, Thomas Schön

The Kaczmarz algorithm (KA) is a popular method for solving a system of linear equations. In this note we derive a new exponential convergence result for the KA. The key allowing us to establish the new result is to rewrite the KA in such a way that its solution path can be interpreted as the output from a particular dynamical system. The asymptotic stability results of the corresponding dynamical system can then be leveraged to prove exponential convergence of the KA. The new bound is also compared to existing bounds.

LGFeb 22, 2021
A Probabilistically Motivated Learning Rate Adaptation for Stochastic Optimization

Filip de Roos, Carl Jidling, Adrian Wills et al.

Machine learning practitioners invest significant manual and computational resources in finding suitable learning rates for optimization algorithms. We provide a probabilistic motivation, in terms of Gaussian inference, for popular stochastic first-order methods. As an important special case, it recovers the Polyak step with a general metric. The inference allows us to relate the learning rate to a dimensionless quantity that can be automatically adapted during training by a control algorithm. The resulting meta-algorithm is shown to adapt learning rates in a robust manner across a large range of initial values when applied to deep learning benchmark problems.

MLDec 14, 2020
Variational State and Parameter Estimation

Jarrad Courts, Johannes Hendriks, Adrian Wills et al.

This paper considers the problem of computing Bayesian estimates of both states and model parameters for nonlinear state-space models. Generally, this problem does not have a tractable solution and approximations must be utilised. In this work, a variational approach is used to provide an assumed density which approximates the desired, intractable, distribution. The approach is deterministic and results in an optimisation problem of a standard form. Due to the parametrisation of the assumed density selected first- and second-order derivatives are readily available which allows for efficient solutions. The proposed method is compared against state-of-the-art Hamiltonian Monte Carlo in two numerical examples.

MLDec 8, 2020
Variational System Identification for Nonlinear State-Space Models

Jarrad Courts, Adrian Wills, Thomas Schön et al.

This paper considers parameter estimation for nonlinear state-space models, which is an important but challenging problem. We address this challenge by employing a variational inference (VI) approach, which is a principled method that has deep connections to maximum likelihood estimation. This VI approach ultimately provides estimates of the model as solutions to an optimisation problem, which is deterministic, tractable and can be solved using standard optimisation tools. A specialisation of this approach for systems with additive Gaussian noise is also detailed. The proposed method is examined numerically on a range of simulated and real examples focusing on the robustness to parameter initialisation; additionally, favourable comparisons are performed against state-of-the-art alternatives.

MEMar 13, 2020
The Elliptical Processes: a Family of Fat-tailed Stochastic Processes

Maria Bånkestad, Jens Sjölund, Jalil Taghia et al.

We present the elliptical processes -- a family of non-parametric probabilistic models that subsumes the Gaussian process and the Student-t process. This generalization includes a range of new fat-tailed behaviors yet retains computational tractability. We base the elliptical processes on a representation of elliptical distributions as a continuous mixture of Gaussian distributions and derive closed-form expressions for the marginal and conditional distributions. We perform numerical experiments on robust regression using an elliptical process defined by a piecewise constant mixing distribution, and show advantages compared with a Gaussian process. The elliptical processes may become a replacement for Gaussian processes in several settings, including when the likelihood is not Gaussian or when accurate tail modeling is critical.

MLFeb 5, 2020
Linearly Constrained Neural Networks

Johannes Hendriks, Carl Jidling, Adrian Wills et al.

We present a novel approach to modelling and learning vector fields from physical systems using neural networks that explicitly satisfy known linear operator constraints. To achieve this, the target function is modelled as a linear transformation of an underlying potential field, which is in turn modelled by a neural network. This transformation is chosen such that any prediction of the target function is guaranteed to satisfy the constraints. The approach is demonstrated on both simulated and real data examples.

SYSep 3, 2019
Stochastic quasi-Newton with line-search regularization

Adrian Wills, Thomas Schön

In this paper we present a novel quasi-Newton algorithm for use in stochastic optimisation. Quasi-Newton methods have had an enormous impact on deterministic optimisation problems because they afford rapid convergence and computationally attractive algorithms. In essence, this is achieved by learning the second-order (Hessian) information based on observing first-order gradients. We extend these ideas to the stochastic setting by employing a highly flexible model for the Hessian and infer its value based on observing noisy gradients. In addition, we propose a stochastic counterpart to standard line-search procedures and demonstrate the utility of this combination on maximum likelihood identification for general nonlinear state space models.

MLFeb 12, 2018
Stochastic quasi-Newton with adaptive step lengths for large-scale problems

Adrian Wills, Thomas Schön

We provide a numerically robust and fast method capable of exploiting the local geometry when solving large-scale stochastic optimisation problems. Our key innovation is an auxiliary variable construction coupled with an inverse Hessian approximation computed using a receding history of iterates and gradients. It is the Markov chain nature of the classic stochastic gradient algorithm that enables this development. The construction offers a mechanism for stochastic line search adapting the step length. We numerically evaluate and compare against current state-of-the-art with encouraging performance on real-world benchmark problems where the number of observations and unknowns is in the order of millions.