LG AI CLFeb 11, 2025

Automated Capability Discovery via Foundation Model Self-Exploration

arXiv:2502.07577v316.95 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of scalable, automated evaluation for AI researchers and developers, though it is incremental as it builds on existing ideas from open-endedness and foundation models.

The paper tackles the challenge of characterizing the diverse capabilities and risks of foundation models by introducing Automated Capability Discovery (ACD), a framework that uses one foundation model to propose tasks for another, automatically generating thousands of tasks and revealing dozens of capability areas and failure modes across models like GPT, Claude, and Llama, with validation showing high agreement between model-generated and human evaluations.

Foundation models have become general-purpose assistants, exhibiting diverse capabilities across numerous domains through training on web-scale data. It remains challenging to precisely characterize even a fraction of the full spectrum of these abilities and potential risks in any new model. Existing evaluation approaches often require significant human effort, and it is taking increasing effort to design ever harder challenges for more capable models. We introduce Automated Capability Discovery (ACD), a framework that designates one foundation model as a scientist to systematically propose open-ended tasks probing the abilities of a subject model (potentially itself). By combining frontier models with ideas from the field of open-endedness, ACD automatically and systematically uncovers a diverse spectrum of surprising capabilities and failures in the subject model. We demonstrate ACD across a range of foundation models (including the GPT, Claude, and Llama series), showing that it automatically generates thousands of distinct tasks, which are then clustered to reveal dozens of broader capability areas and failure modes, that would be challenging for any single team to uncover. We further validate our method's automated scoring with extensive human surveys, observing high agreement between model-generated and human evaluations. By leveraging foundation models' ability to both create tasks and self-evaluate, ACD is a significant step toward scalable, automated evaluation of novel AI systems. All code and evaluation logs are open-sourced at https://github.com/conglu1997/ACD.

View on arXiv PDF Code

Similar