Evaluating LLM-generated code for domain-specific languages: molecular dynamics with LAMMPS

Ethan Holbrook, Juan C. Verduzco, Alejandro Strachan

arXiv:2603.2063041.3h-index: 6

AI Analysis

For domain experts in molecular dynamics who need to use LAMMPS, this work provides a method to assess LLM-generated scripts, but the findings are incremental as they confirm known limitations of LLMs in specialized domains.

The paper evaluates LLMs' ability to generate valid input scripts for LAMMPS, a molecular dynamics DSL, finding that current LLMs have significant limitations in producing scientifically correct code. The proposed evaluation procedure uses normalization and parsing to identify common errors without costly simulations.

Large language models (LLMs) are changing the way researchers interact with code and data in scientific computing. While their ability to generate general-purpose code is well established, their effectiveness in producing scientifically valid code/input scripting for domain-specific languages (DSLs) remains largely unexplored. We propose an evaluation procedure that enables domain experts (who may not be experts in the DSL) to assess the validity of LLM-generated input files for LAMMPS, a widely used molecular dynamics (MD) code, and to use those assessments to evaluate the performance of state-of-the-art LLMs and identify common issues. Key to the evaluation procedure are a normalization step to generate canonical files and an extensible parser for syntax analysis. The following steps isolate common errors without incurring costly tests (in time and computational resources). Once a working input file is generated, LLMs can accelerate verification tests. Our findings highlight limitations of LLMs in generating scientific DSLs and a practical path forward for their integration into domain-specific computational ecosystems by domain experts.

View on arXiv PDF

Similar