QSpark: Towards Reliable Qiskit Code Generation
This addresses the challenge of error-prone AI-generated quantum code for researchers and developers, though it is incremental with clear limitations on advanced tasks.
The paper tackled the problem of generating reliable Qiskit code for quantum circuits by fine-tuning the Qwen2.5-Coder-32B model with RL methods, achieving up to 56.29% Pass@1 on the Qiskit HumanEval benchmark, a 10 percentage point improvement over baselines.
Quantum circuits must be error-resilient, yet LLMs like Granite-20B-Code and StarCoder often output flawed Qiskit code. We fine-tuned the Qwen2.5-Coder-32B model with two RL methods, Group Relative Policy Optimization (GRPO) and Odds-Ratio Preference Optimization (ORPO), using a richly annotated synthetic dataset. On the Qiskit HumanEval benchmark, ORPO reaches 56.29% Pass@1 ($\approx+10$ pp over Granite-8B-QK) and GRPO hits 49%, both beating all general-purpose baselines; on the original HumanEval they score 65.90% and 63.00%. GRPO performs well on basic tasks (44/78) and excels on intermediate ones (41/68), but neither GRPO nor ORPO solves any of the five advanced tasks, highlighting clear gains yet room for progress in AI-assisted quantum programming.