Tobias J. Oechtering

IT
h-index28
16papers
82citations
Novelty45%
AI Score52

16 Papers

SYMay 1, 2016
Uncertain Wiretap Channels and Secure Estimation

Moritz Wiese, Karl Henrik Johansson, Tobias J. Oechtering et al.

Uncertain wiretap channels are introduced. Their zero-error secrecy capacity is defined. If the sensor-estimator channel is perfect, it is also calculated. Further properties are discussed. The problem of estimating a dynamical system with nonstochastic disturbances is studied where the sensor is connected to the estimator and an eavesdropper via an uncertain wiretap channel. The estimator should obtain a uniformly bounded estimation error whereas the eavesdropper's error should tend to infinity. It is proved that the system can be estimated securely if the zero-error capacity of the sensor-estimator channel is strictly larger than the logarithm of the system's unstable pole and the zero-error secrecy capacity of the uncertain wiretap channel is positive.

MLApr 26, 2023
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering et al.

In this work, we study the performance of the Thompson Sampling algorithm for Contextual Bandit problems based on the framework introduced by Neu et al. and their concept of lifted information ratio. First, we prove a comprehensive bound on the Thompson Sampling expected cumulative regret that depends on the mutual information of the environment parameters and the history. Then, we introduce new bounds on the lifted information ratio that hold for sub-Gaussian rewards, thus generalizing the results from Neu et al. which analysis requires binary rewards. Finally, we provide explicit regret bounds for the special cases of unstructured bounded contextual bandits, structured bounded contextual bandits with Laplace likelihood, structured Bernoulli bandits, and bounded linear contextual bandits.

LGJul 18, 2022
An Information-Theoretic Analysis of Bayesian Reinforcement Learning

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering et al.

Building on the framework introduced by Xu and Raginksy [1] for supervised learning problems, we study the best achievable performance for model-based Bayesian reinforcement learning problems. With this purpose, we define minimum Bayesian regret (MBR) as the difference between the maximum expected cumulative reward obtainable either by learning from the collected data or by knowing the environment and its dynamics. We specialize this definition to reinforcement learning problems modeled as Markov decision processes (MDPs) whose kernel parameters are unknown to the agent and whose uncertainty is expressed by a prior distribution. One method for deriving upper bounds on the MBR is presented and specific bounds based on the relative entropy and the Wasserstein distance are given. We then focus on two particular cases of MDPs, the multi-armed bandit problem (MAB) and the online optimization with partial feedback problem. For the latter problem, we show that our bounds can recover from below the current information-theoretic bounds by Russo and Van Roy [2].

22.9LGApr 14
Instantiating Bayesian CVaR lower bounds in Interactive Decision Making Problems

Raghav Bongole, Tobias J. Oechtering, Mikael Skoglund

Recent work established a generalized-Fano framework for lower bounding prior-predictive (Bayesian) CVaR in interactive statistical decision making. In this paper, we show how to instantiate that framework in concrete interactive problems and derive explicit Bayesian CVaR lower bounds from its abstract corollaries. Our approach compares a hard model with a reference model using squared Hellinger distance, and combines a lower bound on a reference hinge term with a bound on the distinguishability of the two models. We apply this approach to canonical examples, including Gaussian bandits, and obtain explicit bounds that make the dependence on key problem parameters transparent. These results show how the generalized-Fano Bayesian CVaR framework can be used as a practical lower-bound tool for interactive learning and risk-sensitive decision making.

12.0ITApr 12
Context-aware Privacy Bounds for Linear Queries

Heng Zhao, Sara Saeidian, Tobias J. Oechtering

Linear queries, as the basis of broad analysis tasks, are often released through privacy mechanisms based on differential privacy (DP), the most popular framework for privacy protection. However, DP adopts a context-free definition that operates independently of the data-generating distribution. In this paper, we revisit the privacy analysis of the Laplace mechanism through the lens of pointwise maximal leakage (PML). We demonstrate that the distribution-agnostic definition of the DP framework often mandates excessive noise. To address this, we incorporate an assumption about the prior distribution by lower-bounding the probability of any single record belonging to any specific class. With this assumption, we derive a tight, context-aware leakage bound for general linear queries, and prove that our derived bound is strictly tighter than the standard DP guarantee and converges to the DP guarantee as this probability lower bound approaches zero. Numerical evaluations demonstrate that by exploiting this prior knowledge, the required noise scale can be reduced while maintaining privacy guarantees.

18.3ITApr 9
Empirical Coordination over Markov Channel with Independent Source

Mengyuan Zhao, Maël Le Treust, Tobias J. Oechtering

We study joint source-channel coding over Markov channels through the empirical coordination framework. More specifically, we aim at determining the empirical distributions of source and channel symbols that can be induced by a coding scheme. We consider strictly causal encoders that generate channel inputs, without access to the past channel states, henceforth driving the Markov state evolution. Our main result is the single-letter inner and outer bounds of the set of achievable joint distributions, coordinating all the symbols in the network. To establish the inner bound, we introduce a new notion of typicality, the input-driven Markov typicality, and develop its fundamental properties. Contrary to the classical block-Markov coding schemes that rely on the blockwise independence for discrete memoryless channels, our analysis directly exploits the Markov channel structure and improves beyond the independence-based arguments.

33.8ITMar 13
Information Density Bounds for Privacy

Sara Saeidian, Leonhard Grosse, Parastoo Sadeghi et al.

This paper explores the implications of guaranteeing privacy by imposing a lower bound on the information density between the private and the public data. We introduce a novel and operationally meaningful privacy measure called pointwise maximal cost (PMC) and demonstrate that imposing an upper bound on PMC is equivalent to enforcing a lower bound on the information density. PMC quantifies the information leakage about a secret to adversaries who aim to minimize non-negative cost functions after observing the outcome of a privacy mechanism. When restricted to finite alphabets, PMC can equivalently be defined as the information leakage to adversaries aiming to minimize the probability of incorrectly guessing randomized functions of the secret. We study the properties of PMC and apply it to standard privacy mechanisms to demonstrate its practical relevance. Through a detailed examination, we connect PMC with other privacy measures that impose upper or lower bounds on the information density. These are pointwise maximal leakage (PML), local differential privacy (LDP), and (asymmetric) local information privacy. In particular, we show that a mechanism satisfies LDP if and only if it has both bounded PMC and bounded PML. Overall, our work fills a conceptual and operational gap in the taxonomy of privacy measures, bridges existing disconnects between different frameworks, and offers insights for selecting a suitable notion of privacy in a given application.

48.3ITMay 19
Worst-Case Utility Privacy Mechanism via Pointwise Maximal Leakage

Ci Song, Tobias J. Oechtering

We propose a discrete privacy mechanism exploiting beneficial properties of the novel privacy measure Pointwise Maximal Leakage (PML). Given the utility assignment characterized by every input-output letter pair, we study the mechanism design problem that satisfies PML privacy guarantees and maximizes the worst-case utility. Unlike popular privacy measures like Differential Privacy (DP), PML allows us to set some conditional probabilities in the mechanism to be zero and thereby preventing the occurrence of some low utilities while preserving a strict PML constraint. We show that the utility-safe mechanism, with low computational complexity, is optimal for the worst-case utility problem with an additional constraint on the output support set. We finally demonstrate the effectiveness in several numerical experiments. Due to DP's inability to have zeros in the mechanism, the design of privacy mechanisms that optimize the worst-case utility is underexplored, and this work shows that PML is a privacy measure that is perfectly suited for this purpose.

LGOct 21, 2024
Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality

Raghav Bongole, Amaury Gouverneur, Borja Rodríguez-Gálvez et al.

We study agents acting in an unknown environment where the agent's goal is to find a robust policy. We consider robust policies as policies that achieve high cumulative rewards for all possible environments. To this end, we consider agents minimizing the maximum regret over different environment parameters, leading to the study of minimax regret. This research focuses on deriving information-theoretic bounds for minimax regret in Markov Decision Processes (MDPs) with a finite time horizon. Building on concepts from supervised learning, such as minimum excess risk (MER) and minimax excess risk, we use recent bounds on the Bayesian regret to derive minimax regret bounds. Specifically, we establish minimax theorems and use bounds on the Bayesian regret to perform minimax regret analysis using these minimax theorems. Our contributions include defining a suitable minimax regret in the context of MDPs, finding information-theoretic bounds for it, and applying these bounds in various scenarios.

MLDec 3, 2024
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering et al.

We study the performance of the Thompson Sampling algorithm for logistic bandit problems. In this setting, an agent receives binary rewards with probabilities determined by a logistic function, $\exp(β\langle a, θ\rangle)/(1+\exp(β\langle a, θ\rangle))$, with slope parameter $β>0$, and where both the action $a\in \mathcal{A}$ and parameter $θ\in \mathcal{O}$ lie within the $d$-dimensional unit ball. Adopting the information-theoretic framework introduced by Russo and Van Roy (2016), we analyze the information ratio, a statistic that quantifies the trade-off between the immediate regret incurred and the information gained about the optimal action. We improve upon previous results by establishing that the information ratio is bounded by $\tfrac{9}{2}dα^{-2}$, where $α$ is a minimax measure of the alignment between the action space $\mathcal{A}$ and the parameter space $\mathcal{O}$, and is independent of $β$. Using this result, we derive a bound of order $O(d/α\sqrt{T \log(βT/d)})$ on the Bayesian expected regret of Thompson Sampling incurred after $T$ time steps. To our knowledge, this is the first regret bound for logistic bandits that depends only logarithmically on $β$ while being independent of the number of actions. In particular, when the action space contains the parameter space, the bound on the expected regret is of order $\tilde{O}(d \sqrt{T})$.

ITOct 7, 2025
Risk level dependent Minimax Quantile lower bounds for Interactive Statistical Decision Making

Raghav Bongole, Amirreza Zamani, Tobias J. Oechtering et al.

Minimax risk and regret focus on expectation, missing rare failures critical in safety-critical bandits and reinforcement learning. Minimax quantiles capture these tails. Three strands of prior work motivate this study: minimax-quantile bounds restricted to non-interactive estimation; unified interactive analyses that focus on expected risk rather than risk level specific quantile bounds; and high-probability bandit bounds that still lack a quantile-specific toolkit for general interactive protocols. To close this gap, within the interactive statistical decision making framework, we develop high-probability Fano and Le Cam tools and derive risk level explicit minimax-quantile bounds, including a quantile-to-expectation conversion and a tight link between strict and lower minimax quantiles. Instantiating these results for the two-armed Gaussian bandit immediately recovers optimal-rate bounds.

MLFeb 17, 2025
Refined PAC-Bayes Bounds for Offline Bandits

Amaury Gouverneur, Tobias J. Oechtering, Mikael Skoglund

In this paper, we present refined probabilistic bounds on empirical reward estimates for off-policy learning in bandit problems. We build on the PAC-Bayesian bounds from Seldin et al. (2010) and improve on their results using a new parameter optimization approach introduced by Rodríguez et al. (2024). This technique is based on a discretization of the space of possible events to optimize the "in probability" parameter. We provide two parameter-free PAC-Bayes bounds, one based on Hoeffding-Azuma's inequality and the other based on Bernstein's inequality. We prove that our bounds are almost optimal as they recover the same rate as would be obtained by setting the "in probability" parameter after the realization of the data.

MLMar 5, 2024
Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering et al.

This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion analysis from [Dong and Van Roy, 2020], where they proved a bound with regret rate of $O(d\sqrt{T \log(T)})$ for the $d$-dimensional linear bandit setting. We focus on bandit problems with a metric action space and, using a chaining argument, we establish new bounds that depend on the metric entropy of the action space for a variant of Thompson-Sampling. Under suitable continuity assumption of the rewards, our bound offers a tight rate of $O(d\sqrt{T})$ for $d$-dimensional linear bandit problems.

CVApr 10, 2020
Decentralized Differentially Private Segmentation with PATE

Dominik Fay, Jens Sjölund, Tobias J. Oechtering

When it comes to preserving privacy in medical machine learning, two important considerations are (1) keeping data local to the institution and (2) avoiding inference of sensitive information from the trained model. These are often addressed using federated learning and differential privacy, respectively. However, the commonly used Federated Averaging algorithm requires a high degree of synchronization between participating institutions. For this reason, we turn our attention to Private Aggregation of Teacher Ensembles (PATE), where all local models can be trained independently without inter-institutional communication. The purpose of this paper is thus to explore how PATE -- originally designed for classification -- can best be adapted for semantic segmentation. To this end, we build low-dimensional representations of segmentation masks which the student can obtain through low-sensitivity queries to the private aggregator. On the Brain Tumor Segmentation (BraTS 2019) dataset, an Autoencoder-based PATE variant achieves a higher Dice coefficient for the same privacy guarantee than prior work based on noisy Federated Averaging.

IRJan 19, 2020
On the Minimum Achievable Age of Information for General Service-Time Distributions

Jaya Prakash Champati, Ramana R. Avula, Tobias J. Oechtering et al.

There is a growing interest in analysing the freshness of data in networked systems. Age of Information (AoI) has emerged as a popular metric to quantify this freshness at a given destination. There has been a significant research effort in optimizing this metric in communication and networking systems under different settings. In contrast to previous works, we are interested in a fundamental question, what is the minimum achievable AoI in any single-server-single-source queuing system for a given service-time distribution? To address this question, we study a problem of optimizing AoI under service preemptions. Our main result is on the characterization of the minimum achievable average peak AoI (PAoI). We obtain this result by showing that a fixed-threshold policy is optimal in the set of all randomized-threshold causal policies. We use the characterization to provide necessary and sufficient conditions for the service-time distributions under which preemptions are beneficial.

SYJul 14, 2017
Secure Estimation and Zero-Error Secrecy Capacity

Moritz Wiese, Tobias J. Oechtering, Karl Henrik Johansson et al.

We study the problem of securely estimating the states of an unstable dynamical system subject to nonstochastic disturbances. The estimator obtains all its information through an uncertain channel which is subject to nonstochastic disturbances as well, and an eavesdropper obtains a disturbed version of the channel inputs through a second uncertain channel. An encoder observes and block-encodes the states in such a way that, upon sending the generated codeword, the estimator's error is bounded and such that a security criterion is satisfied ensuring that the eavesdropper obtains as little state information as possible. Two security criteria are considered and discussed with the help of a numerical example. A sufficient condition on the uncertain wiretap channel, i.e., the pair formed by the uncertain channel from encoder to estimator and the uncertain channel from encoder to eavesdropper, is derived which ensures that a bounded estimation error and security are achieved. This condition is also shown to be necessary for a subclass of uncertain wiretap channels. To formulate the condition, the zero-error secrecy capacity of uncertain wiretap channels is introduced, i.e., the maximal rate at which data can be transmitted from the encoder to the estimator in such a way that the eavesdropper is unable to reconstruct the transmitted data. Lastly, the zero-error secrecy capacity of uncertain wiretap channels is studied.