CLApr 14, 2022

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

arXiv:2204.06745v11010 citationsh-index: 32Has Code
Originality Synthesis-oriented
AI Analysis

This provides an open-source alternative to proprietary large language models for researchers and developers, though it is incremental in scaling and accessibility.

The authors tackled the problem of limited access to large-scale language models by introducing GPT-NeoX-20B, a 20 billion parameter open-source autoregressive model trained on the Pile, which they found to be a powerful few-shot reasoner, gaining more in performance with five-shot evaluation compared to similarly sized models like GPT-3 and FairSeq.

We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe \model{}'s architecture and training and evaluate its performance on a range of language-understanding, mathematics, and knowledge-based tasks. We find that GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models. We open-source the training and evaluation code, as well as the model weights, at https://github.com/EleutherAI/gpt-neox.

Code Implementations11 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes