CL AISep 4, 2019

From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld

arXiv:1909.01958v37.3105 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of AI mastering standardized science exams for educational assessment, representing a milestone but is incremental as it focuses on multiple-choice questions in a restricted domain.

The Aristo project tackled the challenge of AI performance on standardized science exams, achieving over 90% on Grade 8 and 83% on Grade 12 New York Regents Science Exam multiple-choice questions, marking a significant improvement from previous results.

AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy, but the rich variety of standardized exams has remained a landmark challenge. Even in 2016, the best AI system achieved merely 59.3% on an 8th Grade science exam challenge. This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90% on the exam's non-diagram, multiple choice (NDMC) questions. In addition, our Aristo system, building upon the success of recent language models, exceeded 83% on the corresponding Grade 12 Science Exam NDMC questions. The results, on unseen test questions, are robust across different test years and different variations of this kind of test. They demonstrate that modern NLP methods can result in mastery on this task. While not a full solution to general question-answering (the questions are multiple choice, and the domain is restricted to 8th Grade science), it represents a significant milestone for the field.

View on arXiv PDF

Similar