J. C. Schoeman

LG
4papers
5citations
Novelty50%
AI Score25

4 Papers

LGNov 18, 2022
Credit-cognisant reinforcement learning for multi-agent cooperation

F. Bredell, H. A. Engelbrecht, J. C. Schoeman

Traditional multi-agent reinforcement learning (MARL) algorithms, such as independent Q-learning, struggle when presented with partially observable scenarios, and where agents are required to develop delicate action sequences. This is often the result of the reward for a good action only being available after other agents have taken theirs, and these actions are not credited accordingly. Recurrent neural networks have proven to be a viable solution strategy for solving these types of problems, resulting in significant performance increase when compared to other methods. In this paper, we explore a different approach and focus on the experiences used to update the action-value functions of each agent. We introduce the concept of credit-cognisant rewards (CCRs), which allows an agent to perceive the effect its actions had on the environment as well as on its co-agents. We show that by manipulating these experiences and constructing the reward contained within them to include the rewards received by all the agents within the same action sequence, we are able to improve significantly on the performance of independent deep Q-learning as well as deep recurrent Q-learning. We evaluate and test the performance of CCRs when applied to deep reinforcement learning techniques at the hands of a simplified version of the popular card game Hanabi.

LGNov 19, 2023
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regions

E. Visser, C. E. van Daalen, J. C. Schoeman

Bayesian optimization (BO) is a popular method for optimizing expensive black-box functions. BO has several well-documented shortcomings, including computational slowdown with longer optimization runs, poor suitability for non-stationary or ill-conditioned objective functions, and poor convergence characteristics. Several algorithms have been proposed that incorporate local strategies, such as trust regions, into BO to mitigate these limitations; however, none address all of them satisfactorily. To address these shortcomings, we propose the LABCAT algorithm, which extends trust-region-based BO by adding a rotation aligning the trust region with the weighted principal components and an adaptive rescaling strategy based on the length-scales of a local Gaussian process surrogate model with automatic relevance determination. Through extensive numerical experiments using a set of synthetic test functions and the well-known COCO benchmarking software, we show that the LABCAT algorithm outperforms several state-of-the-art BO and other black-box optimization algorithms.

MADec 9, 2024
Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi

F. Bredell, H. A. Engelbrecht, J. C. Schoeman

The card game Hanabi is considered a strong medium for the testing and development of multi-agent reinforcement learning (MARL) algorithms, due to its cooperative nature, partial observability, limited communication and remarkable complexity. Previous research efforts have explored the capabilities of MARL algorithms within Hanabi, focusing largely on advanced architecture design and algorithmic manipulations to achieve state-of-the-art performance for various number of cooperators. However, this often leads to complex solution strategies with high computational cost and requiring large amounts of training data. For humans to solve the Hanabi game effectively, they require the use of conventions, which often allows for a means to implicitly convey ideas or knowledge based on a predefined, and mutually agreed upon, set of "rules" or principles. Multi-agent problems containing partial observability, especially when limited communication is present, can benefit greatly from the use of implicit knowledge sharing. In this paper, we propose a novel approach to augmenting an agent's action space using conventions, which act as a sequence of special cooperative actions that span over and include multiple time steps and multiple agents, requiring agents to actively opt in for it to reach fruition. These conventions are based on existing human conventions, and result in a significant improvement on the performance of existing techniques for self-play and cross-play for various number of cooperators within Hanabi.

LGApr 30, 2021
Degenerate Gaussian factors for probabilistic inference

J. C. Schoeman, C. E. van Daalen, J. A. du Preez

In this paper, we propose a parametrised factor that enables inference on Gaussian networks where linear dependencies exist among the random variables. Our factor representation is effectively a generalisation of traditional Gaussian parametrisations where the positive-definite constraint of the covariance matrix has been relaxed. For this purpose, we derive various statistical operations and results (such as marginalisation, multiplication and affine transformations of random variables) that extend the capabilities of Gaussian factors to these degenerate settings. By using this principled factor definition, degeneracies can be accommodated accurately and automatically at little additional computational cost. As illustration, we apply our methodology to a representative example involving recursive state estimation of cooperative mobile robots.