360Zhinao Technical Report
This work presents an incremental improvement in language model development, offering a new model with extended context capabilities for general AI applications.
The authors tackled the development of the 360Zhinao-7B model by pretraining on 3.4T tokens and extending context windows to 32K and 360K, achieving competitive performance among similar-sized models.
We present 360Zhinao models with 7B parameter size and context lengths spanning 4K, 32K and 360K, all available at https://github.com/Qihoo360/360zhinao. For rapid development in pretraining, we establish a stable and sensitive ablation environment to evaluate and compare experiment runs with minimal model size. Under such guidance, we perfect our data cleaning and composition strategies to pretrain $\texttt{360Zhinao-7B-Base}$ on 3.4T tokens. We also mainly emphasize data during alignment, where we strive to balance quantity and quality with filtering and reformatting. With tailored data, 360Zhinao-7B's context window is easily extended to 32K and 360K. RMs and RLHF are trained following SFT and credibly applied to specific tasks. All together these contributions lead to 360Zhinao-7B's competitive performance among models of similar size.