LGFeb 1, 2023
Off-the-Grid MARL: Datasets with Baselines for Offline Multi-Agent Reinforcement LearningClaude Formanek, Asad Jeewa, Jonathan Shock et al.
Being able to harness the power of large datasets for developing cooperative multi-agent controllers promises to unlock enormous value for real-world applications. Many important industrial systems are multi-agent in nature and are difficult to model using bespoke simulators. However, in industry, distributed processes can often be recorded during operation, and large quantities of demonstrative data stored. Offline multi-agent reinforcement learning (MARL) provides a promising paradigm for building effective decentralised controllers from such datasets. However, offline MARL is still in its infancy and therefore lacks standardised benchmark datasets and baselines typically found in more mature subfields of reinforcement learning (RL). These deficiencies make it difficult for the community to sensibly measure progress. In this work, we aim to fill this gap by releasing off-the-grid MARL (OG-MARL): a growing repository of high-quality datasets with baselines for cooperative offline MARL research. Our datasets provide settings that are characteristic of real-world systems, including complex environment dynamics, heterogeneous agents, non-stationarity, many agents, partial observability, suboptimality, sparse rewards and demonstrated coordination. For each setting, we provide a range of different dataset types (e.g. Good, Medium, Poor, and Replay) and profile the composition of experiences for each dataset. We hope that OG-MARL will serve the community as a reliable source of datasets and help drive progress, while also providing an accessible entry point for researchers new to the field.
AIMar 31, 2023
Selective Reincarnation: Offline-to-Online Multi-Agent Reinforcement LearningClaude Formanek, Callum Rhys Tilbury, Jonathan Shock et al.
'Reincarnation' in reinforcement learning has been proposed as a formalisation of reusing prior computation from past experiments when training an agent in an environment. In this paper, we present a brief foray into the paradigm of reincarnation in the multi-agent (MA) context. We consider the case where only some agents are reincarnated, whereas the others are trained from scratch -- selective reincarnation. In the fully-cooperative MA setting with heterogeneous agents, we demonstrate that selective reincarnation can lead to higher returns than training fully from scratch, and faster convergence than training with full reincarnation. However, the choice of which agents to reincarnate in a heterogeneous system is vitally important to the outcome of the training -- in fact, a poor choice can lead to considerably worse results than the alternatives. We argue that a rich field of work exists here, and we hope that our effort catalyses further energy in bringing the topic of reincarnation to the multi-agent realm.
AIDec 30, 2025Code
Graph-Based Exploration for ARC-AGI-3 Interactive Reasoning TasksEvgenii Rudakov, Jonathan Shock, Benjamin Ultan Cowley
We present a training-free graph-based approach for solving interactive reasoning tasks in the ARC-AGI-3 benchmark. ARC-AGI-3 comprises game-like tasks where agents must infer task mechanics through limited interactions, and adapt to increasing complexity as levels progress. Success requires forming hypotheses, testing them, and tracking discovered mechanics. The benchmark has revealed that state-of-the-art LLMs are currently incapable of reliably solving these tasks. Our method combines vision-based frame processing with systematic state-space exploration using graph-structured representations. It segments visual frames into meaningful components, prioritizes actions based on visual salience, and maintains a directed graph of explored states and transitions. By tracking visited states and tested actions, the agent prioritizes actions that provide the shortest path to untested state-action pairs. On the ARC-AGI-3 Preview Challenge, this structured exploration strategy solves a median of 30 out of 52 levels across six games and ranks 3rd on the private leaderboard, substantially outperforming frontier LLM-based agents. These results demonstrate that explicit graph-structured exploration, even without learning, can serve as a strong baseline for interactive reasoning and underscore the importance of systematic state tracking and action prioritization in sparse-feedback environments where current LLMs fail to capture task dynamics. The code is open source and available at https://github.com/dolphin-in-a-coma/arc-agi-3-just-explore.
54.6CYMar 20
Assessing the Case for Africa-Centric AI Safety EvaluationsGathoni Ireri, Cecil Abungu, Jean Cheptumo et al.
Frontier AI systems are being adopted across Africa, yet most AI safety evaluations are designed and validated in Western environments. In this paper, we argue that the portability gap can leave Africa-centric pathways to severe harm untested when frontier AI systems are embedded in materially constrained and interdependent infrastructures. We define severe AI risks as material risks from frontier AI systems that result in critical harm, measured as the grave injury or death of thousands of people or economic loss and damage equivalent to five percent of a country's GDP. To support AI safety evaluation design, we develop a taxonomy for identifying Africa-centric severe AI risks. The taxonomy links outcome thresholds to process pathways that model risk as the intersection of hazard, vulnerability, and exposure. We distinguish severe risks by amplification and suddenness, where amplification requires that frontier AI be a necessary magnifier of latent danger and suddenness captures harms that materialise rapidly enough to overwhelm ordinary coping and governance capacity. We then propose threat modelling strategies for African contexts, surveying reference class forecasting, structured expert elicitation, scenario planning, and system theoretic process analysis, and tailoring them to constraints of limited resources, poor connectivity, limited technical expertise, weak state capacity, and conflict. We also examine AI misalignment risk, concluding that Africa is more likely to expose universal failure modes through distributional shift than to generate distinct pathways of misalignment. Finally, we offer practical guidance for running evaluations under resource constraints, emphasising open and extensible tooling, tiered evaluation pipelines, and sharing methods and findings to broaden evaluation scope.
HCJul 8, 2025Code
SSSUMO: Real-Time Semi-Supervised Submovement DecompositionEvgenii Rudakov, Jonathan Shock, Otto Lappi et al.
This paper introduces a SSSUMO, semi-supervised deep learning approach for submovement decomposition that achieves state-of-the-art accuracy and speed. While submovement analysis offers valuable insights into motor control, existing methods struggle with reconstruction accuracy, computational cost, and validation, due to the difficulty of obtaining hand-labeled data. We address these challenges using a semi-supervised learning framework. This framework learns from synthetic data, initially generated from minimum-jerk principles and then iteratively refined through adaptation to unlabeled human movement data. Our fully convolutional architecture with differentiable reconstruction significantly surpasses existing methods on both synthetic and diverse human motion datasets, demonstrating robustness even in high-noise conditions. Crucially, the model operates in real-time (less than a millisecond per input second), a substantial improvement over optimization-based techniques. This enhanced performance facilitates new applications in human-computer interaction, rehabilitation medicine, and motor control studies. We demonstrate the model's effectiveness across diverse human-performed tasks such as steering, rotation, pointing, object moving, handwriting, and mouse-controlled gaming, showing notable improvements particularly on challenging datasets where traditional methods largely fail. Training and benchmarking source code, along with pre-trained model weights, are made publicly available at https://github.com/dolphin-in-a-coma/sssumo.
LGSep 29, 2025
Optimisation of Resource Allocation in Heterogeneous Wireless Networks Using Deep Reinforcement LearningOluwaseyi Giwa, Jonathan Shock, Jaco Du Toit et al.
Dynamic resource allocation in heterogeneous wireless networks (HetNets) is challenging for traditional methods under varying user loads and channel conditions. We propose a deep reinforcement learning (DRL) framework that jointly optimises transmit power, bandwidth, and scheduling via a multi-objective reward balancing throughput, energy efficiency, and fairness. Using real base station coordinates, we compare Proximal Policy Optimisation (PPO) and Twin Delayed Deep Deterministic Policy Gradient (TD3) against three heuristic algorithms in multiple network scenarios. Our results show that DRL frameworks outperform heuristic algorithms in optimising resource allocation in dynamic networks. These findings highlight key trade-offs in DRL design for future HetNets.
NENov 25, 2025
Energy Costs and Neural Complexity Evolution in Changing EnvironmentsSian Heesom-Green, Jonathan Shock, Geoff Nitschke
The Cognitive Buffer Hypothesis (CBH) posits that larger brains evolved to enhance survival in changing conditions. However, larger brains also carry higher energy demands, imposing additional metabolic burdens. Alongside brain size, brain organization plays a key role in cognitive ability and, with suitable architectures, may help mitigate energy challenges. This study evolves Artificial Neural Networks (ANNs) used by Reinforcement Learning (RL) agents to investigate how environmental variability and energy costs influence the evolution of neural complexity, defined in terms of ANN size and structure. Results indicate that under energy constraints, increasing seasonality led to smaller ANNs. This challenges CBH and supports the Expensive Brain Hypothesis (EBH), as highly seasonal environments reduced net energy intake and thereby constrained brain size. ANN structural complexity primarily emerged as a byproduct of size, where energy costs promoted the evolution of more efficient networks. These results highlight the role of energy constraints in shaping neural complexity, offering in silico support for biological theory and energy-efficient robotic design.
CYAug 12, 2025
Toward an African Agenda for AI SafetySamuel T. Segun, Rachel Adams, Ana Florido et al.
This paper maps Africa's distinctive AI risk profile, from deepfake fuelled electoral interference and data colonial dependency to compute scarcity, labour disruption and disproportionate exposure to climate driven environmental costs. While major benefits are promised to accrue, the availability, development and adoption of AI also mean that African people and countries face particular AI safety risks, from large scale labour market disruptions to the nefarious use of AI to manipulate public opinion. To date, African perspectives have not been meaningfully integrated into global debates and processes regarding AI safety, leaving African stakeholders with limited influence over the emerging global AI safety governance agenda. While there are Computer Incident Response Teams on the continent, none hosts a dedicated AI Safety Institute or office. We propose a five-point action plan centred on (i) a policy approach that foregrounds the protection of the human rights of those most vulnerable to experiencing the harmful socio-economic effects of AI; (ii) the establishment of an African AI Safety Institute; (iii) promote public AI literacy and awareness; (iv) development of early warning system with inclusive benchmark suites for 25+ African languages; and (v) an annual AU-level AI Safety & Security Forum.
LGJun 13, 2024
Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and EvaluationClaude Formanek, Callum Rhys Tilbury, Louise Beyers et al.
Offline multi-agent reinforcement learning (MARL) is an emerging field with great promise for real-world applications. Unfortunately, the current state of research in offline MARL is plagued by inconsistencies in baselines and evaluation protocols, which ultimately makes it difficult to accurately assess progress, trust newly proposed innovations, and allow researchers to easily build upon prior work. In this paper, we firstly identify significant shortcomings in existing methodologies for measuring the performance of novel algorithms through a representative study of published offline MARL work. Secondly, by directly comparing to this prior work, we demonstrate that simple, well-implemented baselines can achieve state-of-the-art (SOTA) results across a wide range of tasks. Specifically, we show that on 35 out of 47 datasets used in prior work (almost 75% of cases), we match or surpass the performance of the current purported SOTA. Strikingly, our baselines often substantially outperform these more sophisticated algorithms. Finally, we correct for the shortcomings highlighted from this prior work by introducing a straightforward standardised methodology for evaluation and by providing our baseline implementations with statistically robust results across several scenarios, useful for comparisons in future work. Our proposal includes simple and sensible steps that are easy to adopt, which in combination with solid baselines and comparative results, could substantially improve the overall rigour of empirical science in offline MARL moving forward.
NCJan 12, 2022
Brain Structural Saliency Over The AgesDaniel Taylor, Jonathan Shock, Deshendran Moodley et al.
Brain Age (BA) estimation via Deep Learning has become a strong and reliable bio-marker for brain health, but the black-box nature of Neural Networks does not easily allow insight into the features of brain ageing.We trained a ResNet model as a BA regressor on T1 structural MRI volumes from a small cross-sectional cohort of 524 individuals. Using Layer-wise Relevance Propagation (LRP) and DeepLIFT saliency mapping techniques, we analysed the trained model to determine the most relevant structures for brain ageing for the network, and compare these between the saliency mapping techniques. We show the change in attribution of relevance to different brain regions through the course of ageing. A tripartite pattern of relevance attribution to brain regions emerges. Some regions increase in relevance with age (e.g. the right Transverse Temporal Gyrus); some decrease in relevance with age (e.g. the right Fourth Ventricle); and others are consistently relevant across ages. We also examine the effect of the Brain Age Gap (BAG) on the distribution of relevance within the brain volume. It is hoped that these findings will provide clinically relevant region-wise trajectories for normal brain ageing, and a baseline against which to compare brain ageing trajectories.
LGNov 12, 2021
Causal Multi-Agent Reinforcement Learning: Review and Open ProblemsSt John Grimbly, Jonathan Shock, Arnu Pretorius
This paper serves to introduce the reader to the field of multi-agent reinforcement learning (MARL) and its intersection with methods from the study of causality. We highlight key challenges in MARL and discuss these in the context of how causal methods may assist in tackling them. We promote moving toward a 'causality first' perspective on MARL. Specifically, we argue that causality can offer improved safety, interpretability, and robustness, while also providing strong theoretical guarantees for emergent behaviour. We discuss potential solutions for common challenges, and use this context to motivate future research directions.
LGOct 15, 2020
A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learningArnu Pretorius, Scott Cameron, Elan van Biljon et al.
Multi-agent reinforcement learning has recently shown great promise as an approach to networked system control. Arguably, one of the most difficult and important tasks for which large scale networked system control is applicable is common-pool resource management. Crucial common-pool resources include arable land, fresh water, wetlands, wildlife, fish stock, forests and the atmosphere, of which proper management is related to some of society's greatest challenges such as food security, inequality and climate change. Here we take inspiration from a recent research program investigating the game-theoretic incentives of humans in social dilemma situations such as the well-known tragedy of the commons. However, instead of focusing on biologically evolved human-like agents, our concern is rather to better understand the learning and operating behaviour of engineered networked systems comprising general-purpose reinforcement learning agents, subject only to nonbiological constraints such as memory, computation and communication bandwidth. Harnessing tools from empirical game-theoretic analysis, we analyse the differences in resulting solution concepts that stem from employing different information structures in the design of networked multi-agent systems. These information structures pertain to the type of information shared between agents as well as the employed communication protocol and network topology. Our analysis contributes new insights into the consequences associated with certain design choices and provides an additional dimension of comparison between systems beyond efficiency, robustness, scalability and mean control performance.