CL AIDec 12, 2024

Phi-4 Technical Report

Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan Javaheripi, Piero Kauffmann, James R. Lee, Yin Tat Lee

arXiv:2412.08905v144.6725 citationsh-index: 35

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing reasoning abilities in language models for STEM applications, though it is incremental as it builds on the phi-3 architecture with minimal changes.

The authors tackled the problem of improving language model performance, especially in STEM-focused QA, by focusing on data quality and incorporating synthetic data, resulting in phi-4 surpassing its teacher model GPT-4 in these capabilities.

We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Phi family largely distill the capabilities of a teacher model (specifically GPT-4), phi-4 substantially surpasses its teacher model on STEM-focused QA capabilities, giving evidence that our data-generation and post-training techniques go beyond distillation. Despite minimal changes to the phi-3 architecture, phi-4 achieves strong performance relative to its size -- especially on reasoning-focused benchmarks -- due to improved data, training curriculum, and innovations in the post-training scheme.

View on arXiv PDF

Similar