CLLGJul 12, 2024

H2O-Danube3 Technical Report

arXiv:2407.09276v17 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of making large language models more accessible and efficient for deployment on mobile devices, though it is incremental in scaling down existing architectures.

The researchers developed H2O-Danube3, a series of small language models (4B and 500M parameters) trained on 6T and 4T tokens respectively, achieving competitive performance across academic, chat, and fine-tuning benchmarks while being efficient enough to run on smartphones.

We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English tokens in three stages with different data mixes before final supervised tuning for chat version. The models exhibit highly competitive metrics across a multitude of academic, chat, and fine-tuning benchmarks. Thanks to its compact architecture, H2O-Danube3 can be efficiently run on a modern smartphone, enabling local inference and rapid processing capabilities even on mobile devices. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes