CLLGJan 30, 2024

H2O-Danube-1.8B Technical Report

arXiv:2401.16818v212 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This work provides an incremental improvement in small language models, making them more accessible under an open license for broader economic use.

The authors introduced H2O-Danube, a series of 1.8B parameter language models, with H2O-Danube2-1.8B trained on 3T total tokens achieving top ranking on the Open LLM Leaderboard for models under 2B parameters.

We present H2O-Danube, a series of small 1.8B language models consisting of H2O-Danube-1.8B, trained on 1T tokens, and the incremental improved H2O-Danube2-1.8B trained on an additional 2T tokens. Our models exhibit highly competitive metrics across a multitude of benchmarks and, as of the time of this writing, H2O-Danube2-1.8B achieves the top ranking on Open LLM Leaderboard for all models below the 2B parameter range. The models follow core principles of LLama 2 and Mistral, and we leverage and refine various techniques for pre-training large language models. We additionally release chat models trained with supervised fine-tuning followed by direct preference optimization. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes