Daniel Wälchli

2papers

2 Papers

LGMar 24, 2022
Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning

Pascal Weber, Daniel Wälchli, Mustafa Zeqiri et al.

We present the extension of the Remember and Forget for Experience Replay (ReF-ER) algorithm to Multi-Agent Reinforcement Learning (MARL). ReF-ER was shown to outperform state of the art algorithms for continuous control in problems ranging from the OpenAI Gym to complex fluid flows. In MARL, the dependencies between the agents are included in the state-value estimator and the environment dynamics are modeled via the importance weights used by ReF-ER. In collaborative environments, we find the best performance when the value is estimated using individual rewards and we ignore the effects of other actions on the transition map. We benchmark the performance of ReF-ER MARL on the Stanford Intelligent Systems Laboratory (SISL) environments. We find that employing a single feed-forward neural network for the policy and the value function in ReF-ER MARL, outperforms state of the art algorithms that rely on complex neural network architectures.

COMar 20, 2018
Langevin Diffusion for Population Based Sampling with an Application in Bayesian Inference for Pharmacodynamics

Georgios Arampatzis, Daniel Wälchli, Panagiotis Angelikopoulos et al.

We propose an algorithm for the efficient and robust sampling of the posterior probability distribution in Bayesian inference problems. The algorithm combines the local search capabilities of the Manifold Metropolis Adjusted Langevin transition kernels with the advantages of global exploration by a population based sampling algorithm, the Transitional Markov Chain Monte Carlo (TMCMC). The Langevin diffusion process is determined by either the Hessian or the Fisher Information of the target distribution with appropriate modifications for non positive definiteness. The present methods is shown to be superior over other population based algorithms, in sampling probability distributions for which gradients are available and is shown to handle otherwise unidentifiable models. We demonstrate the capabilities and advantages of the method in computing the posterior distribution of the parameters in a Pharmacodynamics model, for glioma growth and its drug induced inhibition, using clinical data.