AI NAJun 25, 2025

AI Assistants to Enhance and Exploit the PETSc Knowledge Base

Barry Smith, Junchao Zhang, Hong Zhang, Lois Curfman McInnes, Murat Keceli, Archit Vasan, Satish Balay, Toby Isaac, Le Chen, Venkatram Vishwanath

arXiv:2506.20608v25.81 citationsh-index: 28ICPP Workshops

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of managing informal knowledge in scientific software for PETSc users and developers, though it is incremental as it applies existing AI methods to a specific domain.

The authors tackled the problem of fragmented and inaccessible knowledge in the PETSc numerical library by building an LLM-powered system with retrieval-augmented generation and reranking tools to assist users and developers, resulting in an extensible framework for enhancing software workflows and documentation.

Generative AI, especially through large language models (LLMs), is transforming how technical knowledge can be accessed, reused, and extended. PETSc, a widely used numerical library for high-performance scientific computing, has accumulated a rich but fragmented knowledge base over its three decades of development, spanning source code, documentation, mailing lists, GitLab issues, Discord conversations, technical papers, and more. Much of this knowledge remains informal and inaccessible to users and new developers. To activate and utilize this knowledge base more effectively, the PETSc team has begun building an LLM-powered system that combines PETSc content with custom LLM tools -- including retrieval-augmented generation (RAG), reranking algorithms, and chatbots -- to assist users, support developers, and propose updates to formal documentation. This paper presents initial experiences designing and evaluating these tools, focusing on system architecture, using RAG and reranking for PETSc-specific information, evaluation methodologies for various LLMs and embedding models, and user interface design. Leveraging the Argonne Leadership Computing Facility resources, we analyze how LLM responses can enhance the development and use of numerical software, with an initial focus on scalable Krylov solvers. Our goal is to establish an extensible framework for knowledge-centered AI in scientific software, enabling scalable support, enriched documentation, and enhanced workflows for research and development. We conclude by outlining directions for expanding this system into a robust, evolving platform that advances software ecosystems to accelerate scientific discovery.

View on arXiv PDF

Similar