NAIRLGSep 30, 2021

SCIMAT: Science and Mathematics Dataset

arXiv:2109.15005v1Has Code
Originality Synthesis-oriented
AI Analysis

This provides a new benchmark for research in AI education and problem-solving, though it is incremental as it focuses on dataset creation rather than novel methods.

The authors introduced SCIMAT, a large open-source dataset of millions of pre-college and college-level math and science problems, and demonstrated preliminary results using a transformer model with character-to-character encoding.

In this work, we announce a comprehensive well curated and opensource dataset with millions of samples for pre-college and college level problems in mathematicsand science. A preliminary set of results using transformer architecture with character to character encoding is shown. The dataset identifies some challenging problem and invites research on better architecture search

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes