Lifelong Learning Metrics
It addresses the challenge of evaluating AI systems that learn continuously, which is crucial for programs like DARPA L2M, but is incremental as it focuses on metrics rather than new learning methods.
The paper tackles the problem of measuring performance in lifelong learning AI systems across diverse tasks like autonomous driving and drone simulation, resulting in a formalism for constructing and characterizing agent performance in such scenarios.
The DARPA Lifelong Learning Machines (L2M) program seeks to yield advances in artificial intelligence (AI) systems so that they are capable of learning (and improving) continuously, leveraging data on one task to improve performance on another, and doing so in a computationally sustainable way. Performers on this program developed systems capable of performing a diverse range of functions, including autonomous driving, real-time strategy, and drone simulation. These systems featured a diverse range of characteristics (e.g., task structure, lifetime duration), and an immediate challenge faced by the program's testing and evaluation team was measuring system performance across these different settings. This document, developed in close collaboration with DARPA and the program performers, outlines a formalism for constructing and characterizing the performance of agents performing lifelong learning scenarios.