CL AI LOOct 16, 2023

Llemma: An Open Language Model For Mathematics

Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck

Cambridge

arXiv:2310.10631v331.5457 citationsh-index: 34Has Code

Originality Incremental advance

AI Analysis

This provides an open model for mathematics that can enhance tools for researchers and students, though it is incremental as it builds on existing pretraining.

They tackled the problem of developing a large language model specialized for mathematics by pretraining Code Llama on a mixed dataset, resulting in Llemma, which outperforms all known open base models and the unreleased Minerva suite on the MATH benchmark on an equi-parameter basis.

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

View on arXiv PDF Code

Similar