SCIMAT: Science and Mathematics Dataset
This provides a new benchmark for research in AI education and problem-solving, though it is incremental as it focuses on dataset creation rather than novel methods.
The authors introduced SCIMAT, a large open-source dataset of millions of pre-college and college-level math and science problems, and demonstrated preliminary results using a transformer model with character-to-character encoding.
In this work, we announce a comprehensive well curated and opensource dataset with millions of samples for pre-college and college level problems in mathematicsand science. A preliminary set of results using transformer architecture with character to character encoding is shown. The dataset identifies some challenging problem and invites research on better architecture search