CL LGMay 28, 2025

360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training

Haosheng Zou, Xiaowei Lv, Shousheng Jia, Lin Li, Xiaochun Gong, Xiangzheng Zhang

arXiv:2505.22296v26.72 citationsh-index: 6Has Code

Originality Synthesis-oriented

AI Analysis

This is an incremental improvement for developers and researchers working with large language models, as it provides a plug-and-play solution for sequence parallelism.

The paper tackles the problem of enabling sequence parallelism for long post-training in large language models by introducing 360-LLaMA-Factory, an open-source tool that has been widely adopted in various models and frameworks.

Adding sequence parallelism into LLaMA-Factory, we open-sourced 360-LLaMA-Factory at https://github.com/Qihoo360/360-LLaMA-Factory. 360-LLaMA-Factory has received wide recognition and used in models such as Light-R1 arXiv:2503.10460, TinyR1 arXiv:2503.04872, Kaggle AIMO math models and also in large companies' training frameworks. This technical report delves deeper into the different sequence parallel modes behind 360-LLaMA-Factory and discusses our implementation insights.

View on arXiv PDF Code

Similar