CLAIMay 22, 2024

360Zhinao Technical Report

arXiv:2405.13386v1Has Code
Originality Synthesis-oriented
AI Analysis

This work presents an incremental improvement in language model development, offering a new model with extended context capabilities for general AI applications.

The authors tackled the development of the 360Zhinao-7B model by pretraining on 3.4T tokens and extending context windows to 32K and 360K, achieving competitive performance among similar-sized models.

We present 360Zhinao models with 7B parameter size and context lengths spanning 4K, 32K and 360K, all available at https://github.com/Qihoo360/360zhinao. For rapid development in pretraining, we establish a stable and sensitive ablation environment to evaluate and compare experiment runs with minimal model size. Under such guidance, we perfect our data cleaning and composition strategies to pretrain $\texttt{360Zhinao-7B-Base}$ on 3.4T tokens. We also mainly emphasize data during alignment, where we strive to balance quantity and quality with filtering and reformatting. With tailored data, 360Zhinao-7B's context window is easily extended to 32K and 360K. RMs and RLHF are trained following SFT and credibly applied to specific tasks. All together these contributions lead to 360Zhinao-7B's competitive performance among models of similar size.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes