CLAILGFeb 26, 2024

Nemotron-4 15B Technical Report

NVIDIA
arXiv:2402.16819v234 citationsh-index: 40
Originality Synthesis-oriented
AI Analysis

This work provides a competitive open model for multilingual and coding tasks, but it is incremental as it builds on existing large language model paradigms.

The paper introduces Nemotron-4 15B, a 15-billion-parameter multilingual language model trained on 8 trillion tokens, which outperforms similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves the best multilingual capabilities among models of its size, even beating larger and specialized ones.

We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes