AR AI LGJan 3, 2025

QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture

Shvetank Prakash, Andrew Cheng, Jason Yik, Arya Tschand, Radhika Ghosal, Ikechukwu Uchendu, Jessica Quaye, Jeffrey Ma, Shreyas Grampurohit, Sofia Giannuzzi, Arnav Balyan, Fin Amin

arXiv:2501.01892v24.36 citationsh-index: 46Has CodeIEEE computer architecture letters

Originality Synthesis-oriented

AI Analysis

This provides a new benchmark for AI agents in computer architecture, though it is incremental as it focuses on dataset creation and evaluation.

The authors tackled the problem of evaluating language models' understanding of computer architecture by introducing QuArch, a dataset of 1500 human-validated question-answer pairs, and found that fine-tuning with it improved small model accuracy by up to 8%.

We introduce QuArch, a dataset of 1500 human-validated question-answer pairs designed to evaluate and enhance language models' understanding of computer architecture. The dataset covers areas including processor design, memory systems, and performance optimization. Our analysis highlights a significant performance gap: the best closed-source model achieves 84% accuracy, while the top small open-source model reaches 72%. We observe notable struggles in memory systems, interconnection networks, and benchmarking. Fine-tuning with QuArch improves small model accuracy by up to 8%, establishing a foundation for advancing AI-driven computer architecture research. The dataset and leaderboard are at https://harvard-edge.github.io/QuArch/.

View on arXiv PDF

Similar