LGMay 22
Graph-based Complexity Forecasts in UK En Route Airspace Using Relevant Aircraft InteractionsEdward Henderson, George De Ath, Nick Pepper
Effectively managing Air Traffic Control Officer (ATCO) workload is crucial in maintaining operational safety. Group supervisors use tools that estimate upcoming traffic load to aid decision-making. However, industry-standard models can fail to capture the nuances of upcoming air traffic complexity. This study presents a probabilistic approach to forecast the complexity of an airspace sector using the number of relevant aircraft pairs, i.e., those that require monitoring or deconfliction by a controller, as a proxy measure for ATCO workload. We adapted an existing filter algorithm to make it suitable for use in London Middle Sector (LMS), a complex airspace sector with multiple flows of traffic above some of the busiest airports in Europe. Through iterative feedback with ATCOs, the algorithm was refined and extended to handle specific geometric and operational considerations. The updated algorithm outperformed the original, with an F1-score of 0.84 compared to 0.69 on a labelled set of 50 traffic scenarios. To produce forecasts of future numbers of relevant aircraft pairs in the sector, a graph representation of the LMS route network was constructed, standardising the spatial fidelity of route legs. The forecasting method accounts for uncertainty in aircraft arrival times by modelling the probability of each aircraft occupying route segments at future query times. When combined with historic distributions of relevant interactions and a live operational data stream, predictions of upcoming ATCO workload could be made up to 45 minutes in advance. The proposed method to forecast upcoming workload showed a significantly stronger correlation with actual relevant interactions (Spearman's $ρ= 0.68$) than a standard traffic volume prediction ($ρ= 0.55$). The resulting data-driven tool shows promise for use by group supervisors to inform sector configuration and ATCO rostering decisions.
SYSep 26, 2023
Context-Aware Generative Models for Prediction of Aircraft Ground TracksNick Pepper, George De Ath, Marc Thomas et al.
Trajectory prediction (TP) plays an important role in supporting the decision-making of Air Traffic Controllers (ATCOs). Traditional TP methods are deterministic and physics-based, with parameters that are calibrated using aircraft surveillance data harvested across the world. These models are, therefore, agnostic to the intentions of the pilots and ATCOs, which can have a significant effect on the observed trajectory, particularly in the lateral plane. This work proposes a generative method for lateral TP, using probabilistic machine learning to model the effect of the epistemic uncertainty arising from the unknown effect of pilot behaviour and ATCO intentions. The models are trained to be specific to a particular sector, allowing local procedures such as coordinated entry and exit points to be modelled. A dataset comprising a week's worth of aircraft surveillance data, passing through a busy sector of the United Kingdom's upper airspace, was used to train and test the models. Specifically, a piecewise linear model was used as a functional, low-dimensional representation of the ground tracks, with its control points determined by a generative model conditioned on partial context. It was found that, of the investigated models, a Bayesian Neural Network using the Laplace approximation was able to generate the most plausible trajectories in order to emulate the flow of traffic through the sector.
HCJan 7
Human-in-the-Loop Testing of AI Agents for Air Traffic Control with a Regulated Assessment FrameworkBen Carvell, Marc Thomas, Andrew Pace et al.
We present a rigorous, human-in-the-loop evaluation framework for assessing the performance of AI agents on the task of Air Traffic Control, grounded in a regulator-certified simulator-based curriculum used for training and testing real-world trainee controllers. By leveraging legally regulated assessments and involving expert human instructors in the evaluation process, our framework enables a more authentic and domain-accurate measurement of AI performance. This work addresses a critical gap in the existing literature: the frequent misalignment between academic representations of Air Traffic Control and the complexities of the actual operational environment. It also lays the foundations for effective future human-machine teaming paradigms by aligning machine performance with human assessment targets.
LGJan 7
Online Action-Stacking Improves Reinforcement Learning Performance for Air Traffic ControlBen Carvell, George De Ath, Eseoghene Benjamin et al.
We introduce online action-stacking, an inference-time wrapper for reinforcement learning policies that produces realistic air traffic control commands while allowing training on a much smaller discrete action space. Policies are trained with simple incremental heading or level adjustments, together with an action-damping penalty that reduces instruction frequency and leads agents to issue commands in short bursts. At inference, online action-stacking compiles these bursts of primitive actions into domain-appropriate compound clearances. Using Proximal Policy Optimisation and the BluebirdDT digital twin platform, we train agents to navigate aircraft along lateral routes, manage climb and descent to target flight levels, and perform two-aircraft collision avoidance under a minimum separation constraint. In our lateral navigation experiments, action stacking greatly reduces the number of issued instructions relative to a damped baseline and achieves comparable performance to a policy trained with a 37-dimensional action space, despite operating with only five actions. These results indicate that online action-stacking helps bridge a key gap between standard reinforcement learning formulations and operational ATC requirements, and provides a simple mechanism for scaling to more complex control scenarios.
AIJan 7
A Future Capabilities Agent for Tactical Air Traffic ControlPaul Kent, George De Ath, Martin Layton et al.
Escalating air traffic demand is driving the adoption of automation to support air traffic controllers, but existing approaches face a trade-off between safety assurance and interpretability. Optimisation-based methods such as reinforcement learning offer strong performance but are difficult to verify and explain, while rules-based systems are transparent yet rarely check safety under uncertainty. This paper outlines Agent Mallard, a forward-planning, rules-based agent for tactical control in systemised airspace that embeds a stochastic digital twin directly into its conflict-resolution loop. Mallard operates on predefined GPS-guided routes, reducing continuous 4D vectoring to discrete choices over lanes and levels, and constructs hierarchical plans from an expert-informed library of deconfliction strategies. A depth-limited backtracking search uses causal attribution, topological plan splicing, and monotonic axis constraints to seek a complete safe plan for all aircraft, validating each candidate manoeuvre against uncertain execution scenarios (e.g., wind variation, pilot response, communication loss) before commitment. Preliminary walkthroughs with UK controllers and initial tests in the BluebirdDT airspace digital twin indicate that Mallard's behaviour aligns with expert reasoning and resolves conflicts in simplified scenarios. The architecture is intended to combine model-based safety assessment, interpretable decision logic, and tractable computational performance in future structured en-route environments.
AIAug 4, 2025
AirTrafficGen: Configurable Air Traffic Scenario Generation with Large Language ModelsDewi Sid William Gould, George De Ath, Ben Carvell et al.
The manual design of scenarios for Air Traffic Control (ATC) training is a demanding and time-consuming bottleneck that limits the diversity of simulations available to controllers. To address this, we introduce a novel, end-to-end approach, $\texttt{AirTrafficGen}$, that leverages large language models (LLMs) to automate and control the generation of complex ATC scenarios. Our method uses a purpose-built, graph-based representation to encode sector topology (including airspace geometry, routes, and fixes) into a format LLMs can process. Through rigorous benchmarking, we show that state-of-the-art models like Gemini 2.5 Pro, OpenAI o3, GPT-oss-120b and GPT-5 can generate high-traffic scenarios while maintaining operational realism. Our engineered prompting enables fine-grained control over interaction presence, type, and location. Initial findings suggest these models are also capable of iterative refinement, correcting flawed scenarios based on simple textual feedback. This approach provides a scalable alternative to manual scenario design, addressing the need for a greater volume and variety of ATC training and validation simulations. More broadly, this work showcases the potential of LLMs for complex planning in safety-critical domains.
LGJul 17, 2025
Air Traffic Controller Task Demand via Graph Neural Networks: An Interpretable Approach to Airspace ComplexityEdward Henderson, Dewi Gould, Richard Everson et al.
Real-time assessment of near-term Air Traffic Controller (ATCO) task demand is a critical challenge in an increasingly crowded airspace, as existing complexity metrics often fail to capture nuanced operational drivers beyond simple aircraft counts. This work introduces an interpretable Graph Neural Network (GNN) framework to address this gap. Our attention-based model predicts the number of upcoming clearances, the instructions issued to aircraft by ATCOs, from interactions within static traffic scenarios. Crucially, we derive an interpretable, per-aircraft task demand score by systematically ablating aircraft and measuring the impact on the model's predictions. Our framework significantly outperforms an ATCO-inspired heuristic and is a more reliable estimator of scenario complexity than established baselines. The resulting tool can attribute task demand to specific aircraft, offering a new way to analyse and understand the drivers of complexity for applications in controller training and airspace redesign.
LGMar 31, 2022
MBORE: Multi-objective Bayesian Optimisation by Density-Ratio EstimationGeorge De Ath, Tinkle Chugh, Alma A. M. Rahat
Optimisation problems often have multiple conflicting objectives that can be computationally and/or financially expensive. Mono-surrogate Bayesian optimisation (BO) is a popular model-based approach for optimising such black-box functions. It combines objective values via scalarisation and builds a Gaussian process (GP) surrogate of the scalarised values. The location which maximises a cheap-to-query acquisition function is chosen as the next location to expensively evaluate. While BO is an effective strategy, the use of GPs is limiting. Their performance decreases as the problem input dimensionality increases, and their computational complexity scales cubically with the amount of data. To address these limitations, we extend previous work on BO by density-ratio estimation (BORE) to the multi-objective setting. BORE links the computation of the probability of improvement acquisition function to that of probabilistic classification. This enables the use of state-of-the-art classifiers in a BO-like framework. In this work we present MBORE: multi-objective Bayesian optimisation by density-ratio estimation, and compare it to BO across a range of synthetic and real-world benchmarks. We find that MBORE performs as well as or better than BO on a wide variety of problems, and that it outperforms BO on high-dimensional and real-world problems.
LGMay 3, 2021
How Bayesian Should Bayesian Optimisation Be?George De Ath, Richard Everson, Jonathan Fieldsend
Bayesian optimisation (BO) uses probabilistic surrogate models - usually Gaussian processes (GPs) - for the optimisation of expensive black-box functions. At each BO iteration, the GP hyperparameters are fit to previously-evaluated data by maximising the marginal likelihood. However, this fails to account for uncertainty in the hyperparameters themselves, leading to overconfident model predictions. This uncertainty can be accounted for by taking the Bayesian approach of marginalising out the model hyperparameters. We investigate whether a fully-Bayesian treatment of the Gaussian process hyperparameters in BO (FBBO) leads to improved optimisation performance. Since an analytic approach is intractable, we compare FBBO using three approximate inference schemes to the maximum likelihood approach, using the Expected Improvement (EI) and Upper Confidence Bound (UCB) acquisition functions paired with ARD and isotropic Matern kernels, across 15 well-known benchmark problems for 4 observational noise settings. FBBO using EI with an ARD kernel leads to the best performance in the noise-free setting, with much less difference between combinations of BO components when the noise is increased. FBBO leads to over-exploration with UCB, but is not detrimental with EI. Therefore, we recommend that FBBO using EI with an ARD kernel as the default choice for BO.
LGOct 15, 2020
Asynchronous ε-Greedy Bayesian OptimisationGeorge De Ath, Richard M. Everson, Jonathan E. Fieldsend
Batch Bayesian optimisation (BO) is a successful technique for the optimisation of expensive black-box functions. Asynchronous BO can reduce wallclock time by starting a new evaluation as soon as another finishes, thus maximising resource utilisation. To maximise resource allocation, we develop a novel asynchronous BO method, AEGiS (Asynchronous $ε$-Greedy Global Search) that combines greedy search, exploiting the surrogate's mean prediction, with Thompson sampling and random selection from the approximate Pareto set describing the trade-off between exploitation (surrogate mean prediction) and exploration (surrogate posterior variance). We demonstrate empirically the efficacy of AEGiS on synthetic benchmark problems, meta-surrogate hyperparameter tuning problems and real-world problems, showing that AEGiS generally outperforms existing methods for asynchronous BO. When a single worker is available performance is no worse than BO using expected improvement.
LGApr 17, 2020
What do you Mean? The Role of the Mean Function in Bayesian OptimisationGeorge De Ath, Jonathan E. Fieldsend, Richard M. Everson
Bayesian optimisation is a popular approach for optimising expensive black-box functions. The next location to be evaluated is selected via maximising an acquisition function that balances exploitation and exploration. Gaussian processes, the surrogate models of choice in Bayesian optimisation, are often used with a constant prior mean function equal to the arithmetic mean of the observed function values. We show that the rate of convergence can depend sensitively on the choice of mean function. We empirically investigate 8 mean functions (constant functions equal to the arithmetic mean, minimum, median and maximum of the observed function evaluations, linear, quadratic polynomials, random forests and RBF networks), using 10 synthetic test problems and two real-world problems, and using the Expected Improvement and Upper Confidence Bound acquisition functions. We find that for design dimensions $\ge5$ using a constant mean function equal to the worst observed quality value is consistently the best choice on the synthetic problems considered. We argue that this worst-observed-quality function promotes exploitation leading to more rapid convergence. However, for the real-world tasks the more complex mean functions capable of modelling the fitness landscape may be effective, although there is no clearly optimum choice.
LGFeb 5, 2020
$ε$-shotgun: $ε$-greedy Batch Bayesian OptimisationGeorge De Ath, Richard M. Everson, Jonathan E. Fieldsend et al.
Bayesian optimisation is a popular, surrogate model-based approach for optimising expensive black-box functions. Given a surrogate model, the next location to expensively evaluate is chosen via maximisation of a cheap-to-query acquisition function. We present an $ε$-greedy procedure for Bayesian optimisation in batch settings in which the black-box function can be evaluated multiple times in parallel. Our $ε$-shotgun algorithm leverages the model's prediction, uncertainty, and the approximated rate of change of the landscape to determine the spread of batch solutions to be distributed around a putative location. The initial target location is selected either in an exploitative fashion on the mean prediction, or -- with probability $ε$ -- from elsewhere in the design space. This results in locations that are more densely sampled in regions where the function is changing rapidly and in locations predicted to be good (i.e close to predicted optima), with more scattered samples in regions where the function is flatter and/or of poorer quality. We empirically evaluate the $ε$-shotgun methods on a range of synthetic functions and two real-world problems, finding that they perform at least as well as state-of-the-art batch methods and in many cases exceed their performance.
LGNov 28, 2019
Greed is Good: Exploration and Exploitation Trade-offs in Bayesian OptimisationGeorge De Ath, Richard M. Everson, Alma A. M. Rahat et al.
The performance of acquisition functions for Bayesian optimisation to locate the global optimum of continuous functions is investigated in terms of the Pareto front between exploration and exploitation. We show that Expected Improvement (EI) and the Upper Confidence Bound (UCB) always select solutions to be expensively evaluated on the Pareto front, but Probability of Improvement is not guaranteed to do so and Weighted Expected Improvement does so only for a restricted range of weights. We introduce two novel $ε$-greedy acquisition functions. Extensive empirical evaluation of these together with random search, purely exploratory, and purely exploitative search on 10 benchmark problems in 1 to 10 dimensions shows that $ε$-greedy algorithms are generally at least as effective as conventional acquisition functions (e.g., EI and UCB), particularly with a limited budget. In higher dimensions $ε$-greedy approaches are shown to have improved performance over conventional approaches. These results are borne out on a real world computational fluid dynamics optimisation problem and a robotics active learning problem. Our analysis and experiments suggest that the most effective strategy, particularly in higher dimensions, is to be mostly greedy, occasionally selecting a random exploratory solution.
CVMay 22, 2018
Part-based Tracking by SamplingGeorge De Ath, Richard M. Everson
We propose a novel part-based method for tracking an arbitrary object in challenging video sequences. The colour distribution of tracked image patches on the target object are represented by pairs of RGB samples and counts of how many pixels in the patch are similar to them. Patches are placed by segmenting the object in the given bounding box and placing patches in homogeneous regions of the object. These are located in subsequent image frames by applying non-shearing affine transformations to the patches' previous locations, locally optimising the best of these, and evaluating their quality using a modified Bhattacharyya distance. In experiments carried out on VOT2018 and OTB100 benchmarks, the tracker achieves higher performance than all other part-based trackers. An ablation study is used to reveal the effectiveness of each tracking component, with largest performance gains found when using the patch placement scheme.
CVMay 3, 2018
Visual Object Tracking: The Initialisation ProblemGeorge De Ath, Richard Everson
Model initialisation is an important component of object tracking. Tracking algorithms are generally provided with the first frame of a sequence and a bounding box (BB) indicating the location of the object. This BB may contain a large number of background pixels in addition to the object and can lead to parts-based tracking algorithms initialising their object models in background regions of the BB. In this paper, we tackle this as a missing labels problem, marking pixels sufficiently away from the BB as belonging to the background and learning the labels of the unknown pixels. Three techniques, One-Class SVM (OC-SVM), Sampled-Based Background Model (SBBM) (a novel background model based on pixel samples), and Learning Based Digital Matting (LBDM), are adapted to the problem. These are evaluated with leave-one-video-out cross-validation on the VOT2016 tracking benchmark. Our evaluation shows both OC-SVMs and SBBM are capable of providing a good level of segmentation accuracy but are too parameter-dependent to be used in real-world scenarios. We show that LBDM achieves significantly increased performance with parameters selected by cross validation and we show that it is robust to parameter variation.