CL AI LGFeb 26, 2024

Nemotron-4 15B Technical Report

Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu

NVIDIA

arXiv:2402.16819v217.734 citationsh-index: 40

Originality Synthesis-oriented

AI Analysis

This work provides a competitive open model for multilingual and coding tasks, but it is incremental as it builds on existing large language model paradigms.

The paper introduces Nemotron-4 15B, a 15-billion-parameter multilingual language model trained on 8 trillion tokens, which outperforms similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves the best multilingual capabilities among models of its size, even beating larger and specialized ones.

We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.

View on arXiv PDF

Similar