Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust
This work assesses AI code generation for scientific computing across multiple languages, but it is incremental as it focuses on evaluating existing models without introducing new methods.
The study evaluated ChatGPT 3.5 and 4 in generating code for scientific programs across nine programming languages, finding that while both versions produced compilable and runnable codes, some languages were easier for the AI to handle and parallel codes were particularly challenging to generate correctly.
This study evaluates the capabilities of ChatGPT versions 3.5 and 4 in generating code across a diverse range of programming languages. Our objective is to assess the effectiveness of these AI models for generating scientific programs. To this end, we asked ChatGPT to generate three distinct codes: a simple numerical integration, a conjugate gradient solver, and a parallel 1D stencil-based heat equation solver. The focus of our analysis was on the compilation, runtime performance, and accuracy of the codes. While both versions of ChatGPT successfully created codes that compiled and ran (with some help), some languages were easier for the AI to use than others (possibly because of the size of the training sets used). Parallel codes -- even the simple example we chose to study here -- also difficult for the AI to generate correctly.