Improving AGI Evaluation: A Data Science Perspective
This work addresses the challenge of AGI evaluation for researchers and developers, offering a new perspective but is incremental as it builds on existing data science methods.
The paper tackles the problem of evaluating potential AGI systems by critiquing current intuition-based synthetic tasks and proposing an alternative philosophy focused on robust task execution, derived from data science practices, to demonstrate AGI competence through practical examples.
Evaluation of potential AGI systems and methods is difficult due to the breadth of the engineering goal. We have no methods for perfect evaluation of the end state, and instead measure performance on small tests designed to provide directional indication that we are approaching AGI. In this work we argue that AGI evaluation methods have been dominated by a design philosophy that uses our intuitions of what intelligence is to create synthetic tasks, that have performed poorly in the history of AI. Instead we argue for an alternative design philosophy focused on evaluating robust task execution that seeks to demonstrate AGI through competence. This perspective is developed from common practices in data science that are used to show that a system can be reliably deployed. We provide practical examples of what this would mean for AGI evaluation.