CL AI LGSep 7, 2023

XGen-7B Technical Report

Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu

CMUMicrosoftSalesforce

arXiv:2309.03450v15.517 citationsh-index: 62Has Code

Originality Incremental advance

AI Analysis

This provides an open-source alternative for researchers and developers needing longer context windows, though it is incremental in improving existing open-source models.

The authors tackled the problem of limited sequence length in open-source large language models by training XGen, a series of 7B parameter models with up to 8K sequence length on 1.5T tokens, achieving comparable or better results on standard benchmarks and showing benefits in long-sequence tasks over 2K-sequence models.

Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.

View on arXiv PDF Code

Similar