HC LG MAJan 7

Human-in-the-Loop Testing of AI Agents for Air Traffic Control with a Regulated Assessment Framework

Ben Carvell, Marc Thomas, Andrew Pace, Christopher Dorney, George De Ath, Richard Everson, Nick Pepper, Adam Keane, Samuel Tomlinson, Richard Cannon

arXiv:2601.04288v19.36 citationsAIAA SCITECH 2026 Forum

Originality Synthesis-oriented

AI Analysis

This addresses a critical gap for aviation safety by aligning AI evaluation with real-world operational complexities, though it is incremental in applying existing regulated assessments to AI.

The paper tackles the problem of evaluating AI agents for Air Traffic Control by developing a human-in-the-loop testing framework based on regulator-certified simulators, resulting in a more authentic and domain-accurate assessment method.

We present a rigorous, human-in-the-loop evaluation framework for assessing the performance of AI agents on the task of Air Traffic Control, grounded in a regulator-certified simulator-based curriculum used for training and testing real-world trainee controllers. By leveraging legally regulated assessments and involving expert human instructors in the evaluation process, our framework enables a more authentic and domain-accurate measurement of AI performance. This work addresses a critical gap in the existing literature: the frequent misalignment between academic representations of Air Traffic Control and the complexities of the actual operational environment. It also lays the foundations for effective future human-machine teaming paradigms by aligning machine performance with human assessment targets.

View on arXiv PDF

Similar