First Proof
This provides a new benchmark for assessing AI in advanced mathematics, though it is incremental as it focuses on a specific domain.
The authors introduced a set of ten previously unpublished research-level mathematics questions to evaluate AI systems' ability to answer such questions, with answers known but temporarily encrypted.
To assess the ability of current AI systems to correctly answer research-level mathematics questions, we share a set of ten math questions which have arisen naturally in the research process of the authors. The questions had not been shared publicly until now; the answers are known to the authors of the questions but will remain encrypted for a short time.