AI CL LG QUANT-PHOct 30, 2025

QuantumBench: A Benchmark for Quantum Problem Solving

Shunya Minami, Tatsuya Ishigaki, Ikko Hamamura, Taku Mikuriya, Youmi Ma, Naoaki Okazaki, Hiroya Takamura, Yohichi Suzuki, Tadashi Kadowaki

arXiv:2511.00092v17.83 citationsh-index: 27

Originality Synthesis-oriented

AI Analysis

This provides a domain-specific benchmark for quantum researchers to assess LLMs, but it is incremental as it applies existing evaluation methods to a new field.

The authors tackled the problem of evaluating large language models' understanding of quantum science by introducing QuantumBench, a benchmark with 800 multiple-choice questions across nine areas, and found that existing models show varying performance and sensitivity to question formats.

Large language models are now integrated into many scientific workflows, accelerating data analysis, hypothesis generation, and design space exploration. In parallel with this growth, there is a growing need to carefully evaluate whether models accurately capture domain-specific knowledge and notation, since general-purpose benchmarks rarely reflect these requirements. This gap is especially clear in quantum science, which features non-intuitive phenomena and requires advanced mathematics. In this study, we introduce QuantumBench, a benchmark for the quantum domain that systematically examine how well LLMs understand and can be applied to this non-intuitive field. Using publicly available materials, we compiled approximately 800 questions with their answers spanning nine areas related to quantum science and organized them into an eight-option multiple-choice dataset. With this benchmark, we evaluate several existing LLMs and analyze their performance in the quantum domain, including sensitivity to changes in question format. QuantumBench is the first LLM evaluation dataset built for the quantum domain, and it is intended to guide the effective use of LLMs in quantum research.

View on arXiv PDF

Similar