Jasper Goseling

h-index1

3papers

5citations

Novelty53%

AI Score34

Ranked #129,466 of 206,010 authors (top 63%)#3,515 in CR (top 48%)

3 Papers

MLOct 29, 2025

Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains

Maik Overmars, Jasper Goseling, Richard Boucherie

We study the convergence of off-policy TD(0) with linear function approximation when used to approximate the expected discounted reward in a Markov chain. It is well known that the combination of off-policy learning and function approximation can lead to divergence of the algorithm. Existing results for this setting modify the algorithm, for instance by reweighing the updates using importance sampling. This establishes convergence at the expense of additional complexity. In contrast, our approach is to analyse the standard algorithm, but to restrict our attention to the class of reversible Markov chains. We demonstrate convergence under this mild reversibility condition on the structure of the chain, which in many applications can be assumed using domain knowledge. In particular, we establish a convergence guarantee under an upper bound on the discount factor in terms of the difference between the on-policy and off-policy process. This improves upon known results in the literature that state that convergence holds for a sufficiently small discount factor by establishing an explicit bound. Convergence is with probability one and achieves projected Bellman error equal to zero. To obtain these results, we adapt the stochastic approximation framework that was used by Tsitsiklis and Van Roy [1997 for the on-policy case, to the off-policy case. We illustrate our results using different types of reversible Markov chains, such as one-dimensional random walks and random walks on a weighted graph.

CRJan 22, 2021

The Privacy-Utility Tradeoff of Robust Local Differential Privacy

Milan Lopuhaä-Zwakenberg, Jasper Goseling

We consider data release protocols for data $X=(S,U)$, where $S$ is sensitive; the released data $Y$ contains as much information about $X$ as possible, measured as $\operatorname{I}(X;Y)$, without leaking too much about $S$. We introduce the Robust Local Differential Privacy (RLDP) framework to measure privacy. This framework relies on the underlying distribution of the data, which needs to be estimated from available data. Robust privacy guarantees are ensuring privacy for all distributions in a given set $\mathcal{F}$, for which we study two cases: when $\mathcal{F}$ is the set of all distributions, and when $\mathcal{F}$ is a confidence set arising from a $χ^2$ test on a publicly available dataset. In the former case we introduce a new release protocol which we prove to be optimal in the low privacy regime. In the latter case we present four algorithms that construct RLDP protocols from a given dataset. One of these approximates $\mathcal{F}$ by a polytope and uses results from robust optimisation to yield high utility release protocols. However, this algorithm relies on vertex enumeration and becomes computationally inaccessible for large input spaces. The other three algorithms are low-complexity and build on randomised response. Experiments verify that all four algorithms offer significantly improved utility over regular LDP.

SYOct 13, 2015

Data retrieval time for energy harvesting wireless sensor networks

Mihaela Mitici, Jasper Goseling, Maurits de Graaf et al.

We consider the problem of retrieving a reliable estimate of an attribute monitored by a wireless sensor network, where the sensors harvest energy from the environment independently, at random. Each sensor stores the harvested energy in batteries of limited capacity. Moreover, provided they have sufficient energy, the sensors broadcast their measurements in a decentralized fashion. Clients arrive at the sensor network according to a Poisson process and are interested in retrieving a fixed number of sensor measurements, based on which a reliable estimate is computed. We show that the time until an arbitrary sensor broadcasts has a phase-type distribution. Based on this result and the theory of order statistics of phase-type distributions, we determine the probability distribution of the time needed for a client to retrieve a reliable estimate of an attribute monitored by the sensor network. We also provide closed-form expression for the retrieval time of a reliable estimate when the capacity of the sensor battery or the rate at which energy is harvested is asymptotically large. In addition, we analyze numerically the retrieval time of a reliable estimate for various sizes of the sensor network, maximum capacity of the sensor batteries and rate at which energy is harvested. These results show that the energy harvesting rate and the broadcasting rate are the main parameters that influence the retrieval time of a reliable estimate, while deploying sensors with large batteries does not significantly reduce the retrieval time.