CL AI LGDec 23, 2023

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Dahyun Kim, Chanjun Park, Sanghoon Kim, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang

arXiv:2312.15166v324.1220 citationsNAACL

Originality Incremental advance

AI Analysis

This provides a simple yet effective method for scaling LLMs that promotes broad access through open licensing, though it appears incremental relative to existing up-scaling approaches.

The authors tackled the problem of efficiently scaling large language models by introducing SOLAR 10.7B, a 10.7 billion parameter model that demonstrates superior performance in various NLP tasks and surpasses Mixtral-8x7B-Instruct in instruction-following capabilities.

We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise scaling and continued pretraining. In contrast to other LLM up-scaling methods that use mixture-of-experts, DUS does not require complex changes to train and inference efficiently. We show experimentally that DUS is simple yet effective in scaling up high-performance LLMs from small ones. Building on the DUS model, we additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B-Instruct. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.

View on arXiv PDF

Similar