AI CL CV LGFeb 10, 2019

EvalAI: Towards Better Evaluation Systems for AI Agents

Deshraj Yadav, Rishabh Jain, Harsh Agrawal, Prithvijit Chattopadhyay, Taranjeet Singh, Akash Jain, Shiv Baran Singh, Stefan Lee, Dhruv Batra

arXiv:1902.03570v125.974 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This addresses the need for better evaluation systems for researchers, students, and data scientists, though it is incremental as it builds on existing challenge platforms.

The authors tackled the problem of evaluating AI agents by introducing EvalAI, an open-source platform that provides a scalable solution for benchmarking machine learning models and agents, resulting in simplified and standardized processes to increase measurable progress in AI.

We introduce EvalAI, an open source platform for evaluating and comparing machine learning (ML) and artificial intelligence algorithms (AI) at scale. EvalAI is built to provide a scalable solution to the research community to fulfill the critical need of evaluating machine learning models and agents acting in an environment against annotations or with a human-in-the-loop. This will help researchers, students, and data scientists to create, collaborate, and participate in AI challenges organized around the globe. By simplifying and standardizing the process of benchmarking these models, EvalAI seeks to lower the barrier to entry for participating in the global scientific effort to push the frontiers of machine learning and artificial intelligence, thereby increasing the rate of measurable progress in this domain.

View on arXiv PDF Code

Similar