LG MLJul 6, 2021

Principles for Evaluation of AI/ML Model Performance and Robustness

Olivia Brown, Andrew Curtis, Justin Goodwin

arXiv:2107.02868v13.110 citations

Originality Synthesis-oriented

AI Analysis

It tackles the critical problem of AI/ML robustness for DoD evaluators, but is incremental as it reviews existing practices without introducing new methods.

The paper addresses the need for robust AI/ML model evaluation in the Department of Defense to prevent deployment of brittle systems, recommending best practices and a methodical process for ensuring reliability in national security contexts.

The Department of Defense (DoD) has significantly increased its investment in the design, evaluation, and deployment of Artificial Intelligence and Machine Learning (AI/ML) capabilities to address national security needs. While there are numerous AI/ML successes in the academic and commercial sectors, many of these systems have also been shown to be brittle and nonrobust. In a complex and ever-changing national security environment, it is vital that the DoD establish a sound and methodical process to evaluate the performance and robustness of AI/ML models before these new capabilities are deployed to the field. This paper reviews the AI/ML development process, highlights common best practices for AI/ML model evaluation, and makes recommendations to DoD evaluators to ensure the deployment of robust AI/ML capabilities for national security needs.

View on arXiv PDF

Similar