CL AIMay 25

PennySynth: RAG-Driven Data Synthesis for Automated Quantum Code Generation

Minghao Shao, Nouhaila Innan, Hariharan Janardhanan, Muhammad Kashif, Alberto Marchisio, Muhammad Shafique

arXiv:2605.2557220.6

Predicted impact top 78% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For quantum programmers using PennyLane, this addresses the problem of LLMs hallucinating domain-specific code, with significant performance gains over general-purpose models.

PennySynth uses retrieval-augmented generation with a curated knowledge base of 13,389 PennyLane instruction-code pairs to improve LLM-based quantum code generation, achieving 52-68% pass@5 on QHack challenges, outperforming Claude Sonnet 4.6 by 25-28 percentage points.

The growing complexity of quantum programming frameworks has exposed a critical limitation in existing large language model (LLM)-based code assistants: general-purpose models hallucinate PennyLane-specific gate names, misplace device configurations, and produce structurally invalid circuits when faced with specialized quantum coding challenges. We present PennySynth, a retrieval-augmented generation framework that addresses this gap by conditioning LLM inference on a curated knowledge base of 13,389 PennyLane instruction-code pairs, built via a three-stage extraction, verification, and deduplication pipeline over official PennyLane repositories, community GitHub sources, and QHack competition archives. PennySynth introduces a code-aware embedding strategy using st-codesearch-distilroberta-base, trained for natural-language-to-code retrieval, increasing average retrieval cosine similarity from 0.45 to 0.726 compared to a general-purpose baseline. Evaluated across 74 challenges spanning three years of the QHack competition (2022, 2023, 2024), PennySynth achieves 64%, 68%, and 52% pass@5 on QHack 2022, 2023, and 2024, respectively, improving over Claude Sonnet 4.6 without retrieval by +28, +25, and +28 percentage points. We further introduce a quantum-adapted CodeBLEU metric that upweights qml.* token patterns and show that structural code similarity and functional correctness capture distinct aspects of quantum code quality. Controlled ablations reveal that code-aware embeddings are the primary driver of retrieval performance, while dataset expansion and source composition provide additional gains when retrieval quality is sufficiently precise.

View on arXiv PDF

Similar