CYAIOct 11, 2024

Testing GPT-4-o1-preview on math and science problems: A follow-up study

arXiv:2410.22340v16 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This is an incremental study assessing AI model improvements for educational and scientific problem-solving tasks.

The study tested GPT-4o1-preview on 105 high-school and college-level math and science problems, finding that performance significantly improved compared to earlier tests but remained imperfect, with spatial reasoning posing particular challenges.

In August 2023, Scott Aaronson and I reported the results of testing GPT4 with the Wolfram Alpha and Code Interpreter plug-ins over a collection of 105 original high-school level and college-level science and math problems (Davis and Aaronson, 2023). In September 2024, I tested the recently released model GPT-4o1-preview on the same collection. Overall I found that performance had significantly improved, but was still considerably short of perfect. In particular, problems that involve spatial reasoning are often stumbling blocks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes