Fine-Tuning Small Reasoning Models for Quantum Field Theory

Nathaniel S. Woodward, Zhiqi Gao, Yurii Kvasiuk, Kendrick M. Smith, Frederic Sala, Moritz Münchmeyer

arXiv:2604.1893695.41 citationsh-index: 22Has Code

Predicted impact top 4% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work provides a foundation for developing domain-specific reasoning capabilities in small LLMs for theoretical physics, addressing the scarcity of verifiable training data.

The authors performed the first academic fine-tuning study of small (7B-parameter) reasoning models for theoretical physics, specifically Quantum Field Theory (QFT). They generated over 2,500 synthetic problems and curated human-adapted problems, then conducted RL and SFT experiments, achieving performance gains and analyzing reasoning error evolution.

Despite the growing application of Large Language Models (LLMs) to theoretical physics, there is little academic exploration into how domain-specific physics reasoning ability develops while training these models. To investigate this, we perform the first academic fine-tuning study of small (7B-parameter) reasoning models dedicated specifically to theoretical physics. Because open-source verifiable training data required to train such capabilities is scarce, we developed a robust data generation pipeline that can both create synthetic problems and make existing human-authored problems suitable for model training. Selecting Quantum Field Theory (QFT) as our primary domain, we generated over 2,500 synthetic problems alongside a curated collection of human-adapted problems sourced from arXiv and standard pedagogical resources. We conduct both Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) experiments, benchmarking performance gains as well as generalization to other physics domains. We perform an extensive analysis of model chains-of-though before and after fine-tuning, to understand how reasoning errors evolve during RL and SFT. Finally, we publicly release our data pipeline, verifiable QFT training data, and $\sim$200M tokens of QFT reasoning traces.

View on arXiv PDF Code

Similar