Sebastian Towers

PE
h-index14
6papers
14citations
Novelty69%
AI Score55

6 Papers

PEJun 6, 2025Code
ADIOS: Antibody Development via Opponent Shaping

Sebastian Towers, Aleksandra Kalisz, Philippe A. Robert et al.

Anti-viral therapies are typically designed to target only the current strains of a virus, a myopic response. However, therapy-induced selective pressures drive the emergence of new viral strains, against which the original myopic therapies are no longer effective. This evolutionary response presents an opportunity: our therapies could both defend against and actively influence viral evolution. This motivates our method ADIOS: Antibody Development vIa Opponent Shaping. ADIOS is a meta-learning framework where the process of antibody therapy design, the outer loop, accounts for the virus's adaptive response, the inner loop. With ADIOS, antibodies are not only robust against potential future variants, they also influence, i.e., shape, which future variants emerge. In line with the opponent shaping literature, we refer to our optimised antibodies as shapers. To demonstrate the value of ADIOS, we build a viral evolution simulator using the Absolut! framework, in which shapers successfully target both current and future viral variants, outperforming myopic antibodies. Furthermore, we show that shapers modify the distribution over viral evolutionary trajectories to result in weaker variants. We believe that our ADIOS paradigm will facilitate the discovery of long-lived vaccines and antibody therapies while also generalising to other domains. Specifically, domains such as antimicrobial resistance, cancer treatment, and others with evolutionarily adaptive opponents. Our code is available at https://github.com/olakalisz/adios.

PESep 16, 2024Code
ADIOS: Antibody Development via Opponent Shaping

Sebastian Towers, Aleksandra Kalisz, Philippe A. Robert et al.

Anti-viral therapies are typically designed to target only the current strains of a virus, a myopic response. However, therapy-induced selective pressures drive the emergence of new viral strains, against which the original myopic therapies are no longer effective. This evolutionary response presents an opportunity: our therapies could both defend against and actively influence viral evolution. This motivates our method ADIOS: Antibody Development vIa Opponent Shaping. ADIOS is a meta-learning framework where the process of antibody therapy design, the outer loop, accounts for the virus's adaptive response, the inner loop. With ADIOS, antibodies are not only robust against potential future variants, they also influence, i.e., shape, which future variants emerge. In line with the opponent shaping literature, we refer to our optimised antibodies as shapers. To demonstrate the value of ADIOS, we build a viral evolution simulator using the Absolut! framework, in which shapers successfully target both current and future viral variants, outperforming myopic antibodies. Furthermore, we show that shapers modify the distribution over viral evolutionary trajectories to result in weaker variants. We believe that our ADIOS paradigm will facilitate the discovery of long-lived vaccines and antibody therapies while also generalising to other domains. Specifically, domains such as antimicrobial resistance, cancer treatment, and others with evolutionarily adaptive opponents. Our code is available at https://github.com/olakalisz/adios.

LGMar 16, 2023
Arbitrary Order Meta-Learning with Simple Population-Based Evolution

Chris Lu, Sebastian Towers, Jakob Foerster

Meta-learning, the notion of learning to learn, enables learning systems to quickly and flexibly solve new tasks. This usually involves defining a set of outer-loop meta-parameters that are then used to update a set of inner-loop parameters. Most meta-learning approaches use complicated and computationally expensive bi-level optimisation schemes to update these meta-parameters. Ideally, systems should perform multiple orders of meta-learning, i.e. to learn to learn to learn and so on, to accelerate their own learning. Unfortunately, standard meta-learning techniques are often inappropriate for these higher-order meta-parameters because the meta-optimisation procedure becomes too complicated or unstable. Inspired by the higher-order meta-learning we observe in real-world evolution, we show that using simple population-based evolution implicitly optimises for arbitrarily-high order meta-parameters. First, we theoretically prove and empirically show that population-based evolution implicitly optimises meta-parameters of arbitrarily-high order in a simple setting. We then introduce a minimal self-referential parameterisation, which in principle enables arbitrary-order meta-learning. Finally, we show that higher-order meta-learning improves performance on time series forecasting tasks.

AIFeb 23Code
Recurrent Structural Policy Gradient for Partially Observable Mean Field Games

Clarisse Wibault, Johannes Forkel, Sebastian Towers et al.

Mean Field Games (MFGs) provide a principled framework for modeling interactions in large population models: at scale, population dynamics become deterministic, with uncertainty entering only through aggregate shocks, or common noise. However, algorithmic progress has been limited since model-free methods are too high variance and exact methods scale poorly. Recent Hybrid Structural Methods (HSMs) use Monte Carlo rollouts for the common noise in combination with exact estimation of the expected return, conditioned on those samples. However, HSMs have not been scaled to Partially Observable settings. We propose Recurrent Structural Policy Gradient (RSPG), the first history-aware HSM for settings involving public information. We also introduce MFAX, our JAX-based framework for MFGs. By leveraging known transition dynamics, RSPG achieves state-of-the-art performance as well as an order-of-magnitude faster convergence and solves, for the first time, a macroeconomics MFG with heterogeneous agents, common noise and history-aware policies. MFAX is publicly available at: https://github.com/CWibault/mfax.

SOC-PHNov 7, 2025
A differentiable model of supply-chain shocks

Saad Hamid, José Moran, Luca Mungo et al.

Modelling how shocks propagate in supply chains is an increasingly important challenge in economics. Its relevance has been highlighted in recent years by events such as Covid-19 and the Russian invasion of Ukraine. Agent-based models (ABMs) are a promising approach for this problem. However, calibrating them is hard. We show empirically that it is possible to achieve speed ups of over 3 orders of magnitude when calibrating ABMs of supply networks by running them on GPUs and using automatic differentiation, compared to non-differentiable baselines. This opens the door to scaling ABMs to model the whole global supply network.

MLFeb 7, 2024
Learning mirror maps in policy mirror descent

Carlo Alfano, Sebastian Towers, Silvia Sapora et al.

Policy Mirror Descent (PMD) is a popular framework in reinforcement learning, serving as a unifying perspective that encompasses numerous algorithms. These algorithms are derived through the selection of a mirror map and enjoy finite-time convergence guarantees. Despite its popularity, the exploration of PMD's full potential is limited, with the majority of research focusing on a particular mirror map -- namely, the negative entropy -- which gives rise to the renowned Natural Policy Gradient (NPG) method. It remains uncertain from existing theoretical studies whether the choice of mirror map significantly influences PMD's efficacy. In our work, we conduct empirical investigations to show that the conventional mirror map choice (NPG) often yields less-than-optimal outcomes across several standard benchmark environments. Using evolutionary strategies, we identify more efficient mirror maps that enhance the performance of PMD. We first focus on a tabular environment, i.e. Grid-World, where we relate existing theoretical bounds with the performance of PMD for a few standard mirror maps and the learned one. We then show that it is possible to learn a mirror map that outperforms the negative entropy in more complex environments, such as the MinAtar suite. Our results suggest that mirror maps generalize well across various environments, raising questions about how to best match a mirror map to an environment's structure and characteristics.