From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation
This work addresses the challenge of using LLMs for high-performance and scientific computing, though it is incremental as it evaluates existing methods on new data without introducing novel techniques.
The paper tackled the problem of evaluating how well large language models generate efficient task-based parallel code from different types of prompts, finding that LLMs show varying strengths and weaknesses in correctness and scalability across frameworks like OpenMP Tasking, C++ standard parallelism, and HPX.
Large Language Models (LLM) show strong abilities in code generation, but their skill in creating efficient parallel programs is less studied. This paper explores how LLMs generate task-based parallel code from three kinds of input prompts: natural language problem descriptions, sequential reference implementations, and parallel pseudo code. We focus on three programming frameworks: OpenMP Tasking, C++ standard parallelism, and the asynchronous many-task runtime HPX. Each framework offers different levels of abstraction and control for task execution. We evaluate LLM-generated solutions for correctness and scalability. Our results reveal both strengths and weaknesses of LLMs with regard to problem complexity and framework. Finally, we discuss what these findings mean for future LLM-assisted development in high-performance and scientific computing.