Tobias Sutter

OC
h-index61
19papers
98citations
Novelty54%
AI Score51

19 Papers

OCFeb 20, 2017
From Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic Programming

Peyman Mohajerin Esfahani, Tobias Sutter, Daniel Kuhn et al.

We consider linear programming (LP) problems in infinite dimensional spaces that are in general computationally intractable. Under suitable assumptions, we develop an approximation bridge from the infinite-dimensional LP to tractable finite convex programs in which the performance of the approximation is quantified explicitly. To this end, we adopt the recent developments in two areas of randomized optimization and first order methods, leading to a priori as well as a posterior performance guarantees. We illustrate the generality and implications of our theoretical results in the special case of the long-run average cost and discounted cost optimal control problems for Markov decision processes on Borel spaces. The applicability of the theoretical results is demonstrated through a constrained linear quadratic optimal control problem and a fisheries management problem.

OCJun 7, 2023
End-to-End Learning for Stochastic Optimization: A Bayesian Perspective

Yves Rychener, Daniel Kuhn, Tobias Sutter

We develop a principled approach to end-to-end learning in stochastic optimization. First, we show that the standard end-to-end learning algorithm admits a Bayesian interpretation and trains a posterior Bayes action map. Building on the insights of this analysis, we then propose new end-to-end learning algorithms for training decision maps that output solutions of empirical risk minimization and distributionally robust optimization problems, two dominant modeling paradigms in optimization under uncertainty. Numerical results for a synthetic newsvendor problem illustrate the key differences between alternative training schemes. We also investigate an economic dispatch problem based on real data to showcase the impact of the neural network architecture of the decision maps on their test performance.

DSNov 15, 2012
Isospectral flows on a class of finite-dimensional Jacobi matrices

Tobias Sutter, Debasish Chatterjee, Federico Ramponi et al.

We present a new matrix-valued isospectral ordinary differential equation that asymptotically block-diagonalizes $n\times n$ zero-diagonal Jacobi matrices employed as its initial condition. This o.d.e.\ features a right-hand side with a nested commutator of matrices, and structurally resembles the double-bracket o.d.e.\ studied by R.W.\ Brockett in 1991. We prove that its solutions converge asymptotically, that the limit is block-diagonal, and above all, that the limit matrix is defined uniquely as follows: For $n$ even, a block-diagonal matrix containing $2\times 2$ blocks, such that the super-diagonal entries are sorted by strictly increasing absolute value. Furthermore, the off-diagonal entries in these $2\times 2$ blocks have the same sign as the respective entries in the matrix employed as initial condition. For $n$ odd, there is one additional $1\times 1$ block containing a zero that is the top left entry of the limit matrix. The results presented here extend some early work of Kac and van Moerbeke.

LGJan 26, 2023
A Robust Optimisation Perspective on Counterexample-Guided Repair of Neural Networks

David Boetius, Stefan Leue, Tobias Sutter

Counterexample-guided repair aims at creating neural networks with mathematical safety guarantees, facilitating the application of neural networks in safety-critical domains. However, whether counterexample-guided repair is guaranteed to terminate remains an open question. We approach this question by showing that counterexample-guided repair can be viewed as a robust optimisation algorithm. While termination guarantees for neural network repair itself remain beyond our reach, we prove termination for more restrained machine learning models and disprove termination in a general setting. We empirically study the practical implications of our theoretical results, demonstrating the suitability of common verifiers and falsifiers for repair despite a disadvantageous theoretical result. Additionally, we use our theoretical insights to devise a novel algorithm for repairing linear regression models based on quadratic programming, surpassing existing approaches.

LGMay 22
Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks

David Boetius, Shahaf Bassan, Guy Katz et al.

Shapley additive explanations (SHAP) are widely recognised as computationally intractable for neural networks, since they induce an exponential search space over the input features. In this work, we take a first step towards scaling exact SHAP computation to larger search spaces by introducing an algorithm that leverages recent advances in neural network verification to compute arbitrarily tight exact lower and upper bounds on SHAP values for neural networks, ultimately recovering the exact SHAP values. We demonstrate that our approach scales to orders of magnitude larger search spaces than state-of-the-art exact methods. This provides an important first step towards exact SHAP computation and establishes a principled cornerstone for evaluating statistical approximation methods on larger search spaces.

LGAug 20, 2024
Finding the DeepDream for Time Series: Activation Maximization for Univariate Time Series

Udo Schlegel, Daniel A. Keim, Tobias Sutter

Understanding how models process and interpret time series data remains a significant challenge in deep learning to enable applicability in safety-critical areas such as healthcare. In this paper, we introduce Sequence Dreaming, a technique that adapts Activation Maximization to analyze sequential information, aiming to enhance the interpretability of neural networks operating on univariate time series. By leveraging this method, we visualize the temporal dynamics and patterns most influential in model decision-making processes. To counteract the generation of unrealistic or excessively noisy sequences, we enhance Sequence Dreaming with a range of regularization techniques, including exponential smoothing. This approach ensures the production of sequences that more accurately reflect the critical features identified by the neural network. Our approach is tested on a time series classification dataset encompassing applications in predictive maintenance. The results show that our proposed Sequence Dreaming approach demonstrates targeted activation maximization for different use cases so that either centered class or border activation maximization can be generated. The results underscore the versatility of Sequence Dreaming in uncovering salient temporal features learned by neural networks, thereby advancing model transparency and trustworthiness in decision-critical domains.

LGOct 24, 2024
Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms

Felix Petersen, Christian Borgelt, Tobias Sutter et al.

When training neural networks with custom objectives, such as ranking losses and shortest-path losses, a common problem is that they are, per se, non-differentiable. A popular approach is to continuously relax the objectives to provide gradients, enabling learning. However, such differentiable relaxations are often non-convex and can exhibit vanishing and exploding gradients, making them (already in isolation) hard to optimize. Here, the loss function poses the bottleneck when training a deep neural network. We present Newton Losses, a method for improving the performance of existing hard to optimize losses by exploiting their second-order information via their empirical Fisher and Hessian matrices. Instead of training the neural network with second-order techniques, we only utilize the loss function's second-order information to replace it by a Newton Loss, while training the network with gradient descent. This makes our method computationally efficient. We apply Newton Losses to eight differentiable algorithms for sorting and shortest-paths, achieving significant improvements for less-optimized differentiable algorithms, and consistent improvements, even for well-optimized differentiable algorithms.

OCAug 20, 2025
Distributional Adversarial Attacks and Training in Deep Hedging

Guangyi He, Tobias Sutter, Lukas Gonon

In this paper, we study the robustness of classical deep hedging strategies under distributional shifts by leveraging the concept of adversarial attacks. We first demonstrate that standard deep hedging models are highly vulnerable to small perturbations in the input distribution, resulting in significant performance degradation. Motivated by this, we propose an adversarial training framework tailored to increase the robustness of deep hedging strategies. Our approach extends pointwise adversarial attacks to the distributional setting and introduces a computationally tractable reformulation of the adversarial optimization problem over a Wasserstein ball. This enables the efficient training of hedging strategies that are resilient to distributional perturbations. Through extensive numerical experiments, we show that adversarially trained deep hedging strategies consistently outperform their classical counterparts in terms of out-of-sample performance and resilience to model misspecification. Additional results indicate that the robust strategies maintain reliable performance on real market data and remain effective during periods of market change. Our findings establish a practical and effective framework for robust deep hedging under realistic market uncertainties.

LGJun 23, 2025
Reliability-Adjusted Prioritized Experience Replay

Leonard S. Pleiss, Tobias Sutter, Maximilian Schiffer

Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-10 benchmark.

OCMay 7, 2025
A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance

Axel Friedrich Wolter, Tobias Sutter

We study reinforcement learning by combining recent advances in regularized linear programming formulations with the classical theory of stochastic approximation. Motivated by the challenge of designing algorithms that leverage off-policy data while maintaining on-policy exploration, we propose PGDA-RL, a novel primal-dual Projected Gradient Descent-Ascent algorithm for solving regularized Markov Decision Processes (MDPs). PGDA-RL integrates experience replay-based gradient estimation with a two-timescale decomposition of the underlying nested optimization problem. The algorithm operates asynchronously, interacts with the environment through a single trajectory of correlated data, and updates its policy online in response to the dual variable associated with the occupation measure of the underlying MDP. We prove that PGDA-RL converges almost surely to the optimal value function and policy of the regularized MDP. Our convergence analysis relies on tools from stochastic approximation theory and holds under weaker assumptions than those required by existing primal-dual RL approaches, notably removing the need for a simulator or a fixed behavioral policy.

OCMay 24, 2024
Randomized algorithms and PAC bounds for inverse reinforcement learning in continuous spaces

Angeliki Kamoutsi, Peter Schmitt-Förster, Tobias Sutter et al.

This work studies discrete-time discounted Markov decision processes with continuous state and action spaces and addresses the inverse problem of inferring a cost function from observed optimal behavior. We first consider the case in which we have access to the entire expert policy and characterize the set of solutions to the inverse problem by using occupation measures, linear duality, and complementary slackness conditions. To avoid trivial solutions and ill-posedness, we introduce a natural linear normalization constraint. This results in an infinite-dimensional linear feasibility problem, prompting a thorough analysis of its properties. Next, we use linear function approximators and adopt a randomized approach, namely the scenario approach and related probabilistic feasibility guarantees, to derive epsilon-optimal solutions for the inverse problem. We further discuss the sample complexity for a desired approximation accuracy. Finally, we deal with the more realistic case where we only have access to a finite set of expert demonstrations and a generative model and provide bounds on the error made when working with samples.

OCMay 3, 2024
Regularized Q-learning through Robust Averaging

Peter Schmitt-Förster, Tobias Sutter

We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed algorithm has a computational cost per iteration comparable to Watkins' Q-learning. For the tabular case, we show that 2RA Q-learning converges to the optimal policy and analyze its asymptotic mean-squared error. Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.

OCMay 30, 2023
Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

Mengmeng Li, Daniel Kuhn, Tobias Sutter

We propose policy gradient algorithms for robust infinite-horizon Markov decision processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that display statistical optimality properties and make optimal use of limited data often fail to be rectangular. Unfortunately, the corresponding robust MDPs cannot be solved with dynamic programming techniques and are in fact provably intractable. We first present a randomized projected Langevin dynamics algorithm that solves the robust policy evaluation problem to global optimality but is inefficient. We also propose a deterministic policy gradient method that is efficient but solves the robust policy evaluation problem only approximately, and we prove that the approximation error scales with a new measure of non-rectangularity of the uncertainty set. Finally, we describe an actor-critic algorithm that finds an $ε$-optimal solution for the robust policy improvement problem in $\mathcal{O}(1/ε^4)$ iterations. We thus present the first complete solution scheme for robust MDPs with non-rectangular uncertainty sets offering global optimality guarantees. Numerical experiments show that our algorithms compare favorably against state-of-the-art methods.

LGMay 1, 2023
ISAAC Newton: Input-based Approximate Curvature for Newton's Method

Felix Petersen, Tobias Sutter, Christian Borgelt et al.

We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons. We show that it is possible to compute a good conditioner based on only the input to a respective layer without a substantial computational overhead. The proposed method allows effective training even in small-batch stochastic regimes, which makes it competitive to first-order as well as second-order methods.

OCJun 12, 2021
Distributionally Robust Optimization with Markovian Data

Mengmeng Li, Tobias Sutter, Daniel Kuhn

We study a stochastic program where the probability distribution of the uncertain problem parameters is unknown and only indirectly observed via finitely many correlated samples generated by an unknown Markov chain with $d$ states. We propose a data-driven distributionally robust optimization model to estimate the problem's objective function and optimal solution. By leveraging results from large deviations theory, we derive statistical guarantees on the quality of these estimators. The underlying worst-case expectation problem is nonconvex and involves $\mathcal O(d^2)$ decision variables. Thus, it cannot be solved efficiently for large $d$. By exploiting the structure of this problem, we devise a customized Frank-Wolfe algorithm with convex direction-finding subproblems of size $\mathcal O(d)$. We prove that this algorithm finds a stationary point efficiently under mild conditions. The efficiency of the method is predicated on a dimensionality reduction enabled by a dual reformulation. Numerical experiments indicate that our approach has better computational and statistical properties than the state-of-the-art methods.

LGJun 8, 2021
Robust Generalization despite Distribution Shift via Minimum Discriminating Information

Tobias Sutter, Andreas Krause, Daniel Kuhn

Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge, and use distributionally robust optimization to account for uncertainty due to the limited samples. By leveraging large deviation results, we obtain explicit generalization bounds with respect to the unknown shifted distribution. Lastly, we demonstrate the versatility of our framework by demonstrating it on two rather distinct applications: (1) training classifiers on systematically biased data and (2) off-policy evaluation in Markov Decision Processes.

OCAug 24, 2017
Generalized maximum entropy estimation

Tobias Sutter, David Sutter, Peyman Mohajerin Esfahani et al.

We consider the problem of estimating a probability distribution that maximizes the entropy while satisfying a finite number of moment constraints, possibly corrupted by noise. Based on duality of convex programming, we present a novel approximation scheme using a smoothed fast gradient method that is equipped with explicit bounds on the approximation error. We further demonstrate how the presented scheme can be used for approximating the chemical master equation through the zero-information moment closure method, and for an approximate dynamic programming approach in the context of constrained Markov decision processes with uncountable state and action spaces.

OCJun 7, 2017
On Infinite Linear Programming and the Moment Approach to Deterministic Infinite Horizon Discounted Optimal Control Problems

Angeliki Kamoutsi, Tobias Sutter, Peyman Mohajerin Esfahani et al.

We revisit the linear programming approach to deterministic, continuous time, infinite horizon discounted optimal control problems. In the first part, we relax the original problem to an infinite-dimensional linear program over a measure space and prove equivalence of the two formulations under mild assumptions, significantly weaker than those found in the literature until now. The proof is based on duality theory and mollification techniques for constructing approximate smooth subsolutions to the associated Hamilton-Jacobi-Bellman equation. In the second part, we assume polynomial data and use Lasserre's hierarchy of primal-dual moment-sum-of-squares semidefinite relaxations to approximate the value function and design an approximate optimal feedback controller. We conclude with an illustrative example.

OCAug 3, 2015
A variational approach to path estimation and parameter inference of hidden diffusion processes

Tobias Sutter, Arnab Ganguly, Heinz Koeppl

We consider a hidden Markov model, where the signal process, given by a diffusion, is only indirectly observed through some noisy measurements. The article develops a variational method for approximating the hidden states of the signal process given the full set of observations. This, in particular, leads to systematic approximations of the smoothing densities of the signal process. The paper then demonstrates how an efficient inference scheme, based on this variational approach to the approximation of the hidden states, can be designed to estimate the unknown parameters of stochastic differential equations. Two examples at the end illustrate the efficacy and the accuracy of the presented method.