CLNov 9, 2023

Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset

arXiv:2311.05113v1134 citationsh-index: 27Has Code
Originality Synthesis-oriented
AI Analysis

This provides a domain-specific benchmark for evaluating AI's reasoning and knowledge in mathematics, though it is incremental as it focuses on a narrow topic.

The authors tackled the lack of detailed benchmarks for analyzing AI's mathematical reasoning by introducing Conic10K, a dataset of 10,000 challenging conic section problems in Chinese, which revealed weak performance in existing large language models like GPT-4 on complex reasoning tasks.

Mathematical understanding and reasoning are crucial tasks for assessing the capabilities of artificial intelligence (AI). However, existing benchmarks either require just a few steps of reasoning, or only contain a small amount of data in one specific topic, making it hard to analyse AI's behaviour with reference to different problems within a specific topic in detail. In this work, we propose Conic10K, a challenging math problem dataset on conic sections in Chinese senior high school education. Our dataset contains various problems with different reasoning depths, while only the knowledge from conic sections is required. Since the dataset only involves a narrow range of knowledge, it is easy to separately analyse the knowledge a model possesses and the reasoning ability it has. For each problem, we provide a high-quality formal representation, the reasoning steps, and the final solution. Experiments show that existing large language models, including GPT-4, exhibit weak performance on complex reasoning. We hope that our findings could inspire more advanced techniques for precise natural language understanding and reasoning. Our dataset and codes are available at https://github.com/whyNLP/Conic10K.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes