How to design a dataset compliant with an ML-based system ODD?
This work addresses the challenge of dataset compliance for ML system certification in safety-critical domains like aviation, though it is incremental as it builds on emerging standards without introducing a new paradigm.
The paper tackles the problem of designing datasets that meet the Operational Design Domain (ODD) requirements for certifying ML systems in safety-critical applications, specifically for a vision-based landing task, by presenting a replicable framework that translates system constraints into verifiable data quality requirements and validates them using the LARD dataset.
This paper focuses on a Vision-based Landing task and presents the design and the validation of a dataset that would comply with the Operational Design Domain (ODD) of a Machine-Learning (ML) system. Relying on emerging certification standards, we describe the process for establishing ODDs at both the system and image levels. In the process, we present the translation of high-level system constraints into actionable image-level properties, allowing for the definition of verifiable Data Quality Requirements (DQRs). To illustrate this approach, we use the Landing Approach Runway Detection (LARD) dataset which combines synthetic imagery and real footage, and we focus on the steps required to verify the DQRs. The replicable framework presented in this paper addresses the challenges of designing a dataset compliant with the stringent needs of ML-based systems certification in safety-critical applications.